๐โโ๏ธ I am building a deterministic agentic AI ecosystem at Alibaba. I was the chief scientist at a startup (raised more than 50M$), previously worked at JD Explore Academy and Tencent AI Lab, and held an adjunct researcher position at ZJU.
๐ญ Working on the whole pipeline of LLM R&D and their human-centric applications, including efficient and sufficient training, alignment, evaluations, compression, multilinguality, multimodality, agentic application, and much more.
๐ช I'm keen on bodybuilding (5 years+), marathon (completed first half marathon (126min) in Beijing-2016 and most recent half marathon (86min) in Sydney-2019๐ . will resume training in 2024๐ช๐ป).
๐ฅ I (once๐ ) enjoy cooking.
๐ I like to spend Sundays with my cats (two from 2020-2023, one from 2023).
๐ฅ Recent open-source projects โ agentic AI (data, evaluation, context) and LLM alignment / policy optimization:
- ๐ AgentHER Hindsight relabeling of failed trajectories for training.
- ๐งฌ AgentSynth Synthetic agent data from scratch with execution validation.
- ๐ AdaRubric Dynamic rubric evaluation for trajectory quality.
- ๐๏ธ trajectory_tokenization ReAct with compressed history for long-horizon context.
- ๐ก SigFibPO SNR-calibrated trust regions and causal fiber residuals for multi-domain RLVR (research code + verl hook).



