Shujian Zhang

I am a Research Scientist at Google DeepMind, working on Gemini. My research focuses on natural language processing and machine learning, with a particular emphasis on the post-training of large language models. I am especially interested in instruction tuning, preference modeling, and reinforcement learning from human feedback.

I completed my Ph.D. at the University of Texas at Austin, advised by Prof. Mingyuan Zhou. Prior to that, I obtained my Bachelor’s degree from University of Rochester. During my PhD, I also did some fun internships at Salesforce Research (Summer 2023) and Microsoft Azure AI (Summer 2021 - Winter 2022).

news

Jan 08, 2026	Our work LLM reasoning behavior is now on ArXiv.
Jan 05, 2026	Our work on multiturn reward model is now on ArXiv.
Jan 02, 2026	Our work on multi-turn behavior elicitation with RL is now on ArXiv.
Jul 07, 2025	Our Gemini 2.5 Technical Report is released ArXiv.
May 15, 2025	I will serve as an Area Chair for EMNLP 2025.

selected publications

Preprint

MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models

Wenzhe Li, Shujian Zhang, Wenxuan Zhou, and 5 more authors

arXiv preprint arXiv:2512.24693, 2025
Preprint

Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Zhenyu Zhang, Shujian Zhang, John Lambert, and 6 more authors

arXiv preprint arXiv:2512.23988, 2025
Preprint

Eliciting Behaviors in Multi-Turn Conversations

Jing Huang, Shujian Zhang, Lun Wang, and 3 more authors

arXiv preprint arXiv:2512.23701, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini Team

arXiv preprint arXiv:2507.06261, 2025

PDF
ACL 2025

T-REG: Preference Optimization with Token-Level Reward Regularization

Wenxuan Zhou, Shujian Zhang, Lingxiao Zhao, and 1 more author

arXiv preprint arXiv:2412.02685, 2024
ICLR 2025

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Tong Wu, Shujian Zhang, Kaiqiang Song, and 7 more authors

arXiv preprint arXiv:2410.09102, 2024
EMNLP 2024

WPO: Enhancing RLHF with Weighted Preference Optimization

Wenxuan Zhou, Ravi Agrawal, Shujian Zhang, and 5 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
ICML 2024

Switchable Decision: Dynamic Neural Generation Networks

Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, and 2 more authors

Proceedings of the ICML 2024, 2024
Preprint

AutoML-GPT: Automatic Machine Learning with GPT

Shujian Zhang, Chengyue Gong, Lemeng Wu, and 2 more authors

arXiv preprint arXiv:2305.02499, 2023
ICLR 2023

Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems

Yihao Feng, Shentao Yang, Shujian Zhang, and 4 more authors

arXiv preprint arXiv:2302.10342, 2023
EMNLP 2022

Passage-Mask: A Learnable Regularization Strategy for Retriever-Reader Models

Shujian Zhang, Chengyue Gong, and Xingchao Liu

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
NeurIPS 2022

A unified framework for alternating offline model training and policy learning

Shentao Yang, Shujian Zhang, Yihao Feng, and 1 more author

Advances in Neural Information Processing Systems, 2022
ICML 2022

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Shentao Yang, Yihao Feng, Shujian Zhang, and 1 more author

In International Conference on Machine Learning, 2022
EMNLP 2021

Learning from uneven training data: Unlabeled, single label, and multiple labels

Shujian Zhang, Chengyue Gong, and Eunsol Choi

arXiv e-prints, 2021
ICLR 2021

Contextual dropout: An efficient sample-dependent dropout module

Xinjie Fan, Shujian Zhang, Korawat Tanwisuth, and 2 more authors

arXiv preprint arXiv:2103.04181, 2021
Preprint

Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization

Xingchao Liu, Chengyue Gong, Lemeng Wu, and 3 more authors

arXiv preprint arXiv:2112.01573, 2021
ACL 2021

Knowing more about questions can help: Improving calibration in question answering

Shujian Zhang, Chengyue Gong, and Eunsol Choi

arXiv preprint arXiv:2106.01494, 2021
ICML 2021

Bayesian attention belief networks

Shujian Zhang, Xinjie Fan, Bo Chen, and 1 more author

In International Conference on Machine Learning, 2021
NeurIPS 2020

Bayesian attention modules

Xinjie Fan, Shujian Zhang, Bo Chen, and 1 more author

Advances in Neural Information Processing Systems, 2020

Service

Area Chair: ACL 2024-2025, EMNLP 2024-2025, ICML 2024-2025, AAAI 2024-2025

Reviewer: AAAI 2022–2024, ACL 2020–2023, EMNLP 2019–2023, NeurIPS 2020-2024, ICLR 2022-2025, ICML 2020-2025