Weijian (William) Luo, PhD

Weijian is a RedStar Senior research scientist of Humane Intelligence (hi) lab of Xiaohongshu (RedNote) Inc , Beijing. Currently, he leads the research team of large generative understanding foundation models in hi-lab . Weijian obtained his Doctoral Degree in Statistics and Generative Modeling from the School of Mathematical Sciences, Peking University . He received his M.S. Degree in Applied Statistics from the School of Mathematical Sciences also from Peking University , and his B.S. degree in Mathematics from University of Science and Technology of China (USTC) .

Research Interests: Weijian's early work had set the theory and practices for modern one-step text-to-image generative models. Currently, Weijian leads the research team of large generative understanding models in hi-lab . His team focuses on developing cutting-edge, efficient generative understanding models that can reason, understand humane intentions, and generate vision-audio responses in a real-time manner. Weijian also leads the research direction of next-generation generative models, including one-step text-to-image and video models at scale.

Call for Talents: Weijian's team in Beijing is actively hiring talented research scientists and engineers. The team encourages candidates with strong track records and unparalleled curiosity about next-gen generative understanding models to apply for the RedStar Research Scientist program , as well as the ACE intern program .

Academic Services: Weijian is invited as a reviewer for academic journals including Nature Communications (NC) , Journal of Machine Learning Research (JMLR) , IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) ,IEEE Transactions on Image Processing (TIP) , IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , and Pattern Recognition (PR) . He also reviews for top AI Conferences including NeurIPS, ICML, ICLR, CVPR, ICCV, AISTATS, UAI, ACM-MM , etc;

Contact: pkulwj1994 at icloud dot com

Selected Talks:

Google Deepmind Research invited me to deliver a talk in 12th Nov, 2024 on one-step cross-modality generative models. Please check out the slides through A Path to Human-preferred One-step Text-to-image Generative Models .
The 18th X-AGI && China-R Conference invited me to deliver a talk in the multimodal panel, 18th, October 2025. The title of the talk is Multimodal Generation and Understanding: the Evolution of Data and Models .
Few-step Diffusion Models meetup, and the Diffusion Circle, at the International Conference of Machine Learning, 14th July 2025, Vancouver.
Research Talk @ Genmo AI, Online, 3rd Jan, 2025: RLHF for Text-to-image Models and Beyond.
Invited Talk @ Biomedical Engineering lab, Peking University, 25th Oct, 2024: Recent Progress on Diffusion Distillations.
Invited Talk @ MAPLE lab, Westlake University, 20th Oct, 2024: Efficient Generative Models.

News:

6th May 2026: one paper is public on Arxiv.
Autoregressive Visual Generation Needs a Prologue (Zheng et al., 2026) .
A new paradigm of Auto-regressive (AR) visual generation model.
6th May 2026: one paper is public on Arxiv.
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation (Zheng et al., 2026) .
The first research on the Entropy Cliff phenomenon of the Auto-regressive (AR) visual generation model.
30th April 2026: one paper is accepted by ICML 2026, Seoul, South Korea.
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward (Luo et al., 2026) .
The first and the best Reinforcement Learning methods for few-step generative models using non-differentiable rewards. With a 6B 4-step SoTA model.
30th April 2026: one paper is accepted by ICML 2026, Seoul, South Korea.
Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation (Bai et al., 2025) .
We introduce important techniques that incorporate VLM as rewards for both SDS-based 3D generation and feedforward 3D models.
19th March 2026: one paper is public on Arxiv.
Multimodal OCR: Parse Anything from Documents (Zheng et al., 2026) .
State-of-the Art document parsing Vision Language Model.
15th March 2026: one paper is accepted by TPAMI
One-step Diffusion and Flow Distillation through Implicit Generator Matching (Huang et al., 2026) .
Theory and practices of one-step diffusion generator matching.
7th March 2026: one paper is public on Arxiv
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward (Luo et al., 2026) .
The first and the best Reinforcement Learning methods for few-step generative models using non-differentiable rewards. With a 6B 4-step SoTA model.
12th Feburary 2026: one paper is public on Arxiv
ZeroDiff++: Substantial Unseen Visual-semantic Correlation in Zero-shot Learning (Ye et al., 2026) .
Better Zero Shot Learning (ZSL) powered by generative diffusion.
25th Feburary 2026: one paper is accepted by CVPR 2026
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning (Gu et al., 2025) .
Principles for acceleration && RL of Masked Auto-regressive Models.
2nd Feburary 2026: one paper is public on Arxiv
Ultra Fast PDE Solving via Physics Guided Few-step Diffusion (Cindy et al., 2026) .
Solving Partial Differential Equation (PDE) within a second.
25th January 2026: one paper is accepted by ICLR 2026
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct (Zheng et al., 2025) .
Strong few-step language models that outperform GPT2 (1024 NFE) and dLLMs (1024 NFE) with only 16 NFEs, at the same model size.
20th December, 2025: one paper is accepted by Transactions on Machine Learning Research (TMLR) .
Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching (Bai et al., 2025) Dive3D introduces Score-implicit Matching techniques to text-to-3D generation, which significantly improves generative diversity as well as quality.
20st December 2025: one paper is public on Arxiv
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning (Gu et al., 2025) .
We introduce principal methods for distillation and reinforcement learning of Masked Auto-regressive Models.
20st December 2025: one paper is public on Arxiv
Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation (Bai et al., 2025) .
We introduce important techniques that incorporate VLM as rewards for both SDS-based 3D generation and feedforward 3D models.
1st October 2025: one paper is public on Arxiv
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct (Zheng et al., 2025) .
We introduce DiDi-Instruct, a very strong few-step language model that outperforms GPT2 (1024 NFE) and dLLMs (1024 NFE) with only 16 NFEs, at the same model size.
18th September 2025: one paper is accepted by NeurIPS 2025 @ San Diego and Mexican City.
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation (Luo et al., 2025) .
We present a novel finding: reward maximization with proper regularizations can effectively train large-scale few-step text-to-image generative models.
18th September, 2025: one paper is accepted by NeurIPS 2025 @ San Diego and Mexican City.
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction (Wang et al., 2025) Uni-Instruct unifies over 10 existing one-step diffusion distillation in theory, with an absolute SoTA one-step FID of 1.02 on ImageNet64 generation benchmark.
12th September, 2025: one paper is accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence ( TPAMI ).
Self-Guidance: Boosting Flow and Diffusion Generation on Their Own (Li et al., 2024) .
Congratulations to my mentee student, Tiancheng, for getting a TPAMI acceptance in his first Ph.D. year.
26th Agust, 2025: Introducing the dots.VLM1 by hi-lab: a large and versatile vision-language model built upon DeepseekV3 LLM architecture and an internal 1.2B MoE Vision Encoder. We train the VLM and VE from scratch, resulting in a model on par with leading VLMs on some metrics. Arxiv .
Technical report of the dots.vlm1. (hi-lab multimodal team)
16th June, 2025: one preprint paper is public on Arxiv .
Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching (Bai et al., 2025) Dive3D introduces Score-implicit Matching techniques to text-to-3D generation, which significantly improves generative diversity as well as quality.
25th May, 2025: one preprint paper is public on Arxiv .
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction (Wang et al., 2025) Uni-Instruct unifies over 10 existing one-step diffusion distillation in theory, with an absolute SoTA one-step FID of 1.02 on ImageNet64 generation benchmark.
4th May, 2025: one paper accepted by International Conference of Machine Learning (ICML) 2025 .
Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models (Luo et al., 2024) We introduced a novel score-based PPO algorithm for RL fine-tuning of 1-step text-to-image generative models. Our open-sourced 0.6B DIstar-SDXL-1step model outperforms the 12B FLUX-dev diffusion model in human preference scores.
19th March 2025: one preprint paper is public on Arxiv .
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation (Luo et al., 2025) .
We present a novel finding: reward maximization with proper regularizations can effectively train large-scale few-step text-to-image generative models.
27th February 2025: one paper accepted by CVPR 2025 .
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation (Ye et al., 2024) .
We explore a novel attempt to use reinforcement learning for training diffusion models, with a very strong diffusion model with adaptive generation steps.
23th January 2025: one paper accepted by ICLR 2025 .
Consistency Models Made Easy .
We introduce a set of practical techniques for efficient training of consistency models, together with a comprehensive study on the Scaling Law of consistency models.
5th Dec 2024: One pre-print is public on Arxiv .
Self-Guidance: Boosting Flow and Diffusion Generation on Their Own (Li et al., 2024) .
Self-guidance can improve human hands and bodies of images generated by diffusion or flow models.
1st Dec 2024: One pre-print is public on Arxiv .
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation (Ye et al., 2024) .
We introduced an approach for training variable-time-schedule diffusion models using reinforcement learning.
21st Nov 2024: One single-author paper accepted by Transactions on Machine Learning Research ( TMLR ) .
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences (Luo, 2024) .
Diff-Instruct++ is the first work on preference alignment of one-step text-to-image generative models, opening the preference alignment with the distillation of diffusion and flow models.
12th Nov 2024: Delivered an invited talk at the Google Deepmind Diffusion Reading Group titiled A Path to Human-preferred One-step Text-to-image Generative Models . Check the [Slides] here.
30th Oct 2024: Be invited to give an (internal) online academic talk in the Google Deepmind research team on 12th Nov. The talk title is One-step Text-to-image Generative Models: from Diffusion Distillation to Human-preference Alignment . In this talk, I will share some exciting progress in improving human preferences for one-step and few-step text-to-image generative models through the lens of Reinforcement Learning using Human Feedback (RLHF). Readers can refer to Diff-Insruct++ and Diff-Insruct* for technical details.
25th Oct 2024: An invited talk delivered at the Biomedical Engineering lab led by Dr. Sun at Peking University, Beijing, China. The talk is on Recent Progresses on Diffusion Distillation .
20th Oct 2024: Had an academic visit to MAPLE lab led by Dr. Qi in Westlake University, Hangzhou, China. Delivered a talk on Efficient Generative Models to lab members.
18th Oct 2024: one reprint released on Arxiv .
One-step Flow Matching Generators (Huang et al., 2024) .
We introduce a novel method to distill the flow-matching-based Stable Diffusion 3 model into strong one-step generators.
18th Oct 2024: one reprint released on Arxiv .
Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models (Luo et al., 2024) .
This paper introduces the Diff-Instruct* , a novel approach to train human-preferred large-scale one-step text-to-image generative models through the lens of online RLHF with general score-based constraints. The resulting one-step 0.6B DiT-DI* model achieves a SoTA HPSv2.0 score of 28.70 .
17th Oct 2024: one reprint released on Arxiv .
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences (Luo, 2024) .
This paper introduces the Diff-Instruct++ , the first attempt at human preference alignment of large-scale one-step text-to-image generative models. The aligned one-step 0.6B DiT-DI++ model achieves a leading HPSv2.0 score of 28.48 .
14th Oct 2024: I defended my PhD Thesis in 14th Oct in Peking University . I feel humbled and grateful to be loved and helped by great advisors, family, and awesome friends.
26th Sep 2024: one paper accepted by NeurIPS 2024 .
One-step Diffusion Distillation Through Score Implicit Matching (Luo et al., NeurIPS 2024) .
We introduce the score implicit matching, a novel one-step diffusion distillation approach with an amazing one-step text-to-image generative model. Appreciation to Prof. Zico Kolter and Prof. Guojun Qi.
20th Jun 2024: one preprint released on Arxiv .
Consistency Models Made Easy (Geng et al., 2024) .
We introduce a set of practical techniques for efficient training of consistency models, together with a comprehensive study on the Scaling Law of consistency models.
24th Apr 2024: one paper accepted by ICML 2024 .
Variational Schrödinger Diffusion Models (Deng et al., ICML 2024) .
We introduce an efficient simulation-free Schrödinger diffusion model, with wide applications for image and time-series generation. Congratulations to Yixin and Dr. Deng.
26th Sep 2023: oone paper accepted by NeurIPS 2023 .
Diff-instruct: A Universal Approach for Transferring Knowledge from Pre-trained Diffusion Models (Luo et al., NeurIPS 2023) .
Diff-Instruct is a one-step diffusion distillation approach through the lens of distribution matching, with applications on text-to-3D generation and improving GAN generators.
26th Sep 2023: one paper accepted by NeurIPS 2023 .
Entropy-based Training Methods for Scalable Neural Implicit Samplers (Luo et al., NeurIPS 2023) .
We introduced two interesting training approaches for neural implicit samplers termed KL and Fisher training.
26th Sep 2023: one paper accepted by NeurIPS 2023 .
SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (Xue et al., NeurIPS 2023) .
We introduced a novel diffusion sampler based on the Stochastic Adam theory, integrated for PixelArt-alpha diffusion models.
26th Sep 2023: one paper accepted by NeurIPS 2023 .
Enhancing Adversarial Robustness via Score-based Optimization (Zhang et al., NeurIPS 2023) .
We introduced a novel optimization-based adversarial defense based on pre-trained diffusion models.
9th Apr 2023: one paper released on Arxiv .
A Comprehensive Survey on Knowledge Distillation of Diffusion Models (Luo, 2023) .
The first survey on diffusion distillation and knowledge transferring of diffusion models.

Friends with whom I have worked on projects:

J. Zico Kolter , Professor, Director of the Machine Learning Department, Carnegie Mellon University (CMU).
Guo-jun Qi , Professor, IEEE Fellow, Director of MAPLE Lab of Westlake University.
Guang Lin , Associate Dean for Research, Moses Cobb Stevens Professor in Mathematics, Mech Eng Purdue University .
Kenji Kawaguchi , Presidential Young Professor at Department of Computer Science National University of Singapore .
Zhenguo Li , director of the AI Theory Lab in Huawei Noah’s Ark Lab, Hongkong , Adjunct Professor in the department of computer science and engineering, Hong Kong University of Science and Technology .
He Sun , PhD, tenure-track Assistant Professor at Peking University .
Tianyang Hu , PhD, Incoming Assistant Professor at the Chinese University of Hong Kong, Shenzhen ( CUHK-Shenzhen ).
Wenzheng Chen , PhD, tenure-track Assistant Professor at Peking University .
Wei Deng , PhD, Senior Research Scientist at Morgan Stanley, New York.
Ricky Tian Qi Chen , PhD, Research Scientist at Meta Fundamental AI Research (FAIR), New York.
Zheyuan Hu, PhD from the National University of Singapore, the Winner of the NeurIPS 2024 best paper award.
Seth Forsgren , BS from Princeton, CEO and the founder of producer.ai , San Francisco. producer.ai was acquired by Google Deepmind.
Hayk Martiros , MS from Stanford, CTO and the co-founder of producer.ai , technical VP of Skydio .
Debing Zhang , PhD, Director of Artificial General Intelligence (AGI) team of RedNote , aka Xiaohongshu Inc.

Previous students whom I have advised, hosted, or worked with:

Weimin Bai , PhD student at Peking University, ACE top-talent intern at hi-Lab, Xiaohongshu, co-advised with Professor He Sun.
Yujian Chen , undergraduate student of Computer Science, Peking University, co-advised with Professor Wenzheng Chen.
Jiajun Zha , PhD student of the Hong Kong University of Science and Technology (HKUST), co-advised with Professor Harry Liu.
Haoyang Zheng , PhD student at Purdue University, co-advised with Professor Guang Lin.
Bowen Zheng , PhD student at Chinese University of Hongkong (Shenzhen), co-advised with Professor Tianyang Hu.
Cindy Xiangrui Kong , MS from Carnegie Mellon University (CMU), PhD student at Purdue University, co-advised with Professor Guang Lin.
Junyi Wu , PhD student at Purdue University, co-advised with Professor Guang Lin.
Yongzhao Chao , MS student at Chinese University of Hongkong (Shenzhen), co-advised with Professor Tianyang Hu.
Yubo Li , Undergraduate. student at Tsinghua University, and incoming PhD student at Peking University, co-advised by Professor He Sun.
Yihong Luo , PhD student at HKUST.
Harry Liu , PhD student at Purdue University, co-advised with Professor Guang Lin.
Yifei Wang , Undergraduate student at Peking University, and incoming PhD student at Rice University (Houston). Co-advised with Professor He Sun.
Kaihang Pan , PhD student at Zhejiang University. The winner of CPVR2025 Best Student Paper Honorable Mention.
Le Zhuo , Incoming CS PhD student of MMLab at the Chinese University of Hong Kong.
Zemin Huang , CS PhD student of the joint PhD Program of Zhejiang University and Westlake University, co-advised with Professor Guo-jun Qi.
Tiancheng Li , CS PhD student of the joint PhD Program of Zhejiang University and Westlake University, co-advised with Professor Guo-jun Qi.
Zilyu Ye , CS undergraduate student of South China University of Technology, ByteDance Top Seed Intern, co-advised with Professor Guo-jun Qi.
Yuxuan Gu , Incoming M.S. student at Peking University, co-advised with Professor He Sun.
Chaowei Liu , National University of Singapore (NUS), co-advised with Professor Guo-jun Qi.