news

Jan 08, 2026 Our work LLM reasoning behavior is now on ArXiv.
Jan 05, 2026 Our work on multiturn reward model is now on ArXiv.
Jan 02, 2026 Our work on multi-turn behavior elicitation with RL is now on ArXiv.
Jul 07, 2025 Our Gemini 2.5 Technical Report is released ArXiv.
May 15, 2025 I will serve as an Area Chair for EMNLP 2025.
May 15, 2025 Our T-REG: Preference Optimization with Token-Level Reward Regularization is accepted by ACL 2025.
Jan 22, 2025 Our Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy is accepted by ICLR 2025.
Sep 18, 2024 Our WPO: Enhancing RLHF with Weighted Preference Optimization is accepted by EMNLP 2024.