Shujian Zhang

Google Deepmind; zhangshujian2023@gmail.com

prof_pic_my.jpg

I am a Research Scientist at Google DeepMind, working on Gemini. My research focuses on natural language processing and machine learning, with a particular emphasis on the post-training of large language models. I am especially interested in instruction tuning, preference modeling, and reinforcement learning from human feedback.

I completed my Ph.D. at the University of Texas at Austin, advised by Prof. Mingyuan Zhou. Prior to that, I obtained my Bachelor’s degree from University of Rochester. During my PhD, I also did some fun internships at Salesforce Research (Summer 2023) and Microsoft Azure AI (Summer 2021 - Winter 2022).

news

Jan 08, 2026 Our work LLM reasoning behavior is now on ArXiv.
Jan 05, 2026 Our work on multiturn reward model is now on ArXiv.
Jan 02, 2026 Our work on multi-turn behavior elicitation with RL is now on ArXiv.
Jul 07, 2025 Our Gemini 2.5 Technical Report is released ArXiv.
May 15, 2025 I will serve as an Area Chair for EMNLP 2025.

selected publications

  1. Preprint
    MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models
    Wenzhe Li, Shujian Zhang, Wenxuan Zhou, and 5 more authors
    arXiv preprint arXiv:2512.24693, 2025
  2. Preprint
    Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
    Zhenyu Zhang, Shujian Zhang, John Lambert, and 6 more authors
    arXiv preprint arXiv:2512.23988, 2025
  3. Preprint
    Eliciting Behaviors in Multi-Turn Conversations
    Jing Huang, Shujian Zhang, Lun Wang, and 3 more authors
    arXiv preprint arXiv:2512.23701, 2025
  4. gemini_2p5.jpg
    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
    Gemini Team
    arXiv preprint arXiv:2507.06261, 2025
  5. ACL 2025
    T-REG: Preference Optimization with Token-Level Reward Regularization
    Wenxuan Zhou, Shujian Zhang, Lingxiao Zhao, and 1 more author
    arXiv preprint arXiv:2412.02685, 2024
  6. ICLR 2025
    Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
    Tong Wu, Shujian Zhang, Kaiqiang Song, and 7 more authors
    arXiv preprint arXiv:2410.09102, 2024
  7. EMNLP 2024
    WPO: Enhancing RLHF with Weighted Preference Optimization
    Wenxuan Zhou, Ravi Agrawal, Shujian Zhang, and 5 more authors
    In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
  8. ICML 2024
    Switchable Decision: Dynamic Neural Generation Networks
    Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, and 2 more authors
    Proceedings of the ICML 2024, 2024
  9. Preprint
    AutoML-GPT: Automatic Machine Learning with GPT
    Shujian Zhang, Chengyue Gong, Lemeng Wu, and 2 more authors
    arXiv preprint arXiv:2305.02499, 2023
  10. ICLR 2023
    Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems
    Yihao Feng, Shentao Yang, Shujian Zhang, and 4 more authors
    arXiv preprint arXiv:2302.10342, 2023
  11. EMNLP 2022
    Passage-Mask: A Learnable Regularization Strategy for Retriever-Reader Models
    Shujian Zhang, Chengyue Gong, and Xingchao Liu
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
  12. NeurIPS 2022
    A unified framework for alternating offline model training and policy learning
    Shentao Yang, Shujian Zhang, Yihao Feng, and 1 more author
    Advances in Neural Information Processing Systems, 2022
  13. ICML 2022
    Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
    Shentao Yang, Yihao Feng, Shujian Zhang, and 1 more author
    In International Conference on Machine Learning, 2022
  14. EMNLP 2021
    Learning from uneven training data: Unlabeled, single label, and multiple labels
    Shujian Zhang, Chengyue Gong, and Eunsol Choi
    arXiv e-prints, 2021
  15. ICLR 2021
    Contextual dropout: An efficient sample-dependent dropout module
    Xinjie Fan, Shujian Zhang, Korawat Tanwisuth, and 2 more authors
    arXiv preprint arXiv:2103.04181, 2021
  16. Preprint
    Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization
    Xingchao Liu, Chengyue Gong, Lemeng Wu, and 3 more authors
    arXiv preprint arXiv:2112.01573, 2021
  17. ACL 2021
    Knowing more about questions can help: Improving calibration in question answering
    Shujian Zhang, Chengyue Gong, and Eunsol Choi
    arXiv preprint arXiv:2106.01494, 2021
  18. ICML 2021
    Bayesian attention belief networks
    Shujian Zhang, Xinjie Fan, Bo Chen, and 1 more author
    In International Conference on Machine Learning, 2021
  19. NeurIPS 2020
    Bayesian attention modules
    Xinjie Fan, Shujian Zhang, Bo Chen, and 1 more author
    Advances in Neural Information Processing Systems, 2020

Service

Area Chair: ACL 2024-2025, EMNLP 2024-2025, ICML 2024-2025, AAAI 2024-2025

Reviewer: AAAI 2022–2024, ACL 2020–2023, EMNLP 2019–2023, NeurIPS 2020-2024, ICLR 2022-2025, ICML 2020-2025