Weijian Luo, PhD
"What important truth do very few people agree with you on?"
Weijian is a RedStar Senior research scientist of Humane Intelligence (hi) lab of Xiaohongshu (RedNote) Inc , Beijing. Currently, he leads the research team of large generative understanding foundation models in hi-lab . Weijian obtained his Doctoral Degree in Statistics and Generative Modeling from the School of Mathematical Sciences, Peking University . He received his M.S. Degree in Applied Statistics from the School of Mathematical Sciences also from Peking University , and his B.S. degree in Mathematics from University of Science and Technology of China (USTC) .
Research Interests: Weijian's early work had set the theory and practices for modern one-step text-to-image generative models. Currently, Weijian leads the research team of large generative understanding models in hi-lab . His team focuses on developing cutting-edge, efficient generative understanding models that can reason, understand humane intentions, and generate vision-audio responses in a real-time manner. Weijian also leads the research direction of next-generation generative models, including one-step text-to-image and video models at scale.
Call for Talents: Weijian's team in Beijing is actively hiring talented research scientists and engineers. The team encourages candidates with strong track records and unparalleled curiosity about next-gen generative understanding models to apply for the RedStar Research Scientist program , as well as the ACE intern program .
Academic Services: Weijian is invited as a reviewer for academic journals including Nature Communications (NC) , Journal of Machine Learning Research (JMLR) , IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) ,IEEE Transactions on Image Processing (TIP) , IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , and Pattern Recognition (PR) . He also reviews for top AI Conferences including NeurIPS, ICML, ICLR, CVPR, ICCV, AISTATS, UAI, ACM-MM , etc;
Contact: pkulwj1994 at icloud dot com
Selected Talks:
- Google Deepmind Research invited me to deliver a talk in 12th Nov, 2024 on one-step cross-modality generative models. Please check out the slides through A Path to Human-preferred One-step Text-to-image Generative Models .
- The 18th X-AGI && China-R Conference invited me to deliver a talk in the multimodal panel, 18th, October 2025. The title of the talk is Multimodal Generation and Understanding: the Evolution of Data and Models .
- Few-step Diffusion Models meetup, and the Diffusion Circle, at the International Conference of Machine Learning, 14th July 2025, Vancouver.
- Research Talk @ Genmo AI, Online, 3rd Jan, 2025: RLHF for Text-to-image Models and Beyond.
- Invited Talk @ Biomedical Engineering lab, Peking University, 25th Oct, 2024: Recent Progress on Diffusion Distillations.
- Invited Talk @ MAPLE lab, Westlake University, 20th Oct, 2024: Efficient Generative Models.
News:
-
6th May 2026:
one paper is public on
Arxiv.
Autoregressive Visual Generation Needs a Prologue (Zheng et al., 2026) .
A new paradigm of Auto-regressive (AR) visual generation model. -
6th May 2026:
one paper is public on
Arxiv.
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation (Zheng et al., 2026) .
The first research on the Entropy Cliff phenomenon of the Auto-regressive (AR) visual generation model. -
30th April 2026:
one paper is accepted by
ICML 2026, Seoul, South Korea.
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward (Luo et al., 2026) .
The first and the best Reinforcement Learning methods for few-step generative models using non-differentiable rewards. With a 6B 4-step SoTA model. -
30th April 2026:
one paper is accepted by
ICML 2026, Seoul, South Korea.
Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation (Bai et al., 2025) .
We introduce important techniques that incorporate VLM as rewards for both SDS-based 3D generation and feedforward 3D models. -
19th March 2026:
one paper is public on
Arxiv.
Multimodal OCR: Parse Anything from Documents (Zheng et al., 2026) .
State-of-the Art document parsing Vision Language Model. -
15th March 2026:
one paper is accepted by
TPAMI
One-step Diffusion and Flow Distillation through Implicit Generator Matching (Huang et al., 2026) .
Theory and practices of one-step diffusion generator matching. -
7th March 2026:
one paper is public on
Arxiv
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward (Luo et al., 2026) .
The first and the best Reinforcement Learning methods for few-step generative models using non-differentiable rewards. With a 6B 4-step SoTA model. -
12th Feburary 2026:
one paper is public on
Arxiv
ZeroDiff++: Substantial Unseen Visual-semantic Correlation in Zero-shot Learning (Ye et al., 2026) .
Better Zero Shot Learning (ZSL) powered by generative diffusion. -
25th Feburary 2026:
one paper is accepted by
CVPR 2026
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning (Gu et al., 2025) .
Principles for acceleration && RL of Masked Auto-regressive Models. -
2nd Feburary 2026:
one paper is public on
Arxiv
Ultra Fast PDE Solving via Physics Guided Few-step Diffusion (Cindy et al., 2026) .
Solving Partial Differential Equation (PDE) within a second. -
25th January 2026:
one paper is accepted by
ICLR 2026
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct (Zheng et al., 2025) .
Strong few-step language models that outperform GPT2 (1024 NFE) and dLLMs (1024 NFE) with only 16 NFEs, at the same model size. -
20th December, 2025:
one paper is accepted by
Transactions on Machine Learning Research (TMLR)
.
Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching (Bai et al., 2025) Dive3D introduces Score-implicit Matching techniques to text-to-3D generation, which significantly improves generative diversity as well as quality. -
20st December 2025:
one paper is public on
Arxiv
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning (Gu et al., 2025) .
We introduce principal methods for distillation and reinforcement learning of Masked Auto-regressive Models. -
20st December 2025:
one paper is public on
Arxiv
Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation (Bai et al., 2025) .
We introduce important techniques that incorporate VLM as rewards for both SDS-based 3D generation and feedforward 3D models. -
1st October 2025:
one paper is public on
Arxiv
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct (Zheng et al., 2025) .
We introduce DiDi-Instruct, a very strong few-step language model that outperforms GPT2 (1024 NFE) and dLLMs (1024 NFE) with only 16 NFEs, at the same model size. -
18th September 2025:
one paper is accepted by
NeurIPS 2025
@ San Diego and Mexican City.
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation (Luo et al., 2025) .
We present a novel finding: reward maximization with proper regularizations can effectively train large-scale few-step text-to-image generative models. -
18th September, 2025:
one paper is accepted by
NeurIPS 2025
@ San Diego and Mexican City.
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction (Wang et al., 2025) Uni-Instruct unifies over 10 existing one-step diffusion distillation in theory, with an absolute SoTA one-step FID of 1.02 on ImageNet64 generation benchmark. -
12th September, 2025:
one paper is accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI
).
Self-Guidance: Boosting Flow and Diffusion Generation on Their Own (Li et al., 2024) .
Congratulations to my mentee student, Tiancheng, for getting a TPAMI acceptance in his first Ph.D. year. -
26th Agust, 2025:
Introducing the dots.VLM1 by hi-lab: a large and versatile vision-language model built upon DeepseekV3
LLM architecture and an internal 1.2B MoE Vision Encoder. We train the VLM and VE from scratch,
resulting in a model on par with leading VLMs on some metrics.
Arxiv
.
Technical report of the dots.vlm1. (hi-lab multimodal team) -
16th June, 2025:
one preprint paper is public on
Arxiv
.
Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching (Bai et al., 2025) Dive3D introduces Score-implicit Matching techniques to text-to-3D generation, which significantly improves generative diversity as well as quality. -
25th May, 2025:
one preprint paper is public on
Arxiv
.
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction (Wang et al., 2025) Uni-Instruct unifies over 10 existing one-step diffusion distillation in theory, with an absolute SoTA one-step FID of 1.02 on ImageNet64 generation benchmark. -
4th May, 2025:
one paper accepted by
International Conference of Machine Learning (ICML) 2025
.
Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models (Luo et al., 2024) We introduced a novel score-based PPO algorithm for RL fine-tuning of 1-step text-to-image generative models. Our open-sourced 0.6B DIstar-SDXL-1step model outperforms the 12B FLUX-dev diffusion model in human preference scores. -
19th March 2025:
one preprint paper is public on
Arxiv
.
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation (Luo et al., 2025) .
We present a novel finding: reward maximization with proper regularizations can effectively train large-scale few-step text-to-image generative models. -
27th February 2025:
one paper accepted by
CVPR 2025
.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation (Ye et al., 2024) .
We explore a novel attempt to use reinforcement learning for training diffusion models, with a very strong diffusion model with adaptive generation steps. -
23th January 2025:
one paper accepted by
ICLR 2025
.
Consistency Models Made Easy .
We introduce a set of practical techniques for efficient training of consistency models, together with a comprehensive study on the Scaling Law of consistency models. -
5th Dec 2024:
One pre-print is public on
Arxiv
.
Self-Guidance: Boosting Flow and Diffusion Generation on Their Own (Li et al., 2024) .
Self-guidance can improve human hands and bodies of images generated by diffusion or flow models. -
1st Dec 2024:
One pre-print is public on
Arxiv
.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation (Ye et al., 2024) .
We introduced an approach for training variable-time-schedule diffusion models using reinforcement learning. -
21st Nov 2024:
One
single-author paper
accepted by
Transactions on Machine Learning Research (
TMLR
)
.
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences (Luo, 2024) .
Diff-Instruct++ is the first work on preference alignment of one-step text-to-image generative models, opening the preference alignment with the distillation of diffusion and flow models. - 12th Nov 2024: Delivered an invited talk at the Google Deepmind Diffusion Reading Group titiled A Path to Human-preferred One-step Text-to-image Generative Models . Check the [Slides] here.
- 30th Oct 2024: Be invited to give an (internal) online academic talk in the Google Deepmind research team on 12th Nov. The talk title is One-step Text-to-image Generative Models: from Diffusion Distillation to Human-preference Alignment . In this talk, I will share some exciting progress in improving human preferences for one-step and few-step text-to-image generative models through the lens of Reinforcement Learning using Human Feedback (RLHF). Readers can refer to Diff-Insruct++ and Diff-Insruct* for technical details.
- 25th Oct 2024: An invited talk delivered at the Biomedical Engineering lab led by Dr. Sun at Peking University, Beijing, China. The talk is on Recent Progresses on Diffusion Distillation .
- 20th Oct 2024: Had an academic visit to MAPLE lab led by Dr. Qi in Westlake University, Hangzhou, China. Delivered a talk on Efficient Generative Models to lab members.
-
18th Oct 2024:
one reprint released on
Arxiv
.
One-step Flow Matching Generators (Huang et al., 2024) .
We introduce a novel method to distill the flow-matching-based Stable Diffusion 3 model into strong one-step generators. -
18th Oct 2024:
one reprint released on
Arxiv
.
Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models (Luo et al., 2024) .
This paper introduces the Diff-Instruct* , a novel approach to train human-preferred large-scale one-step text-to-image generative models through the lens of online RLHF with general score-based constraints. The resulting one-step 0.6B DiT-DI* model achieves a SoTA HPSv2.0 score of 28.70 . -
17th Oct 2024:
one reprint released on
Arxiv
.
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences (Luo, 2024) .
This paper introduces the Diff-Instruct++ , the first attempt at human preference alignment of large-scale one-step text-to-image generative models. The aligned one-step 0.6B DiT-DI++ model achieves a leading HPSv2.0 score of 28.48 . - 14th Oct 2024: I defended my PhD Thesis in 14th Oct in Peking University . I feel humbled and grateful to be loved and helped by great advisors, family, and awesome friends.
-
26th Sep 2024:
one paper accepted by
NeurIPS 2024
.
One-step Diffusion Distillation Through Score Implicit Matching (Luo et al., NeurIPS 2024) .
We introduce the score implicit matching, a novel one-step diffusion distillation approach with an amazing one-step text-to-image generative model. Appreciation to Prof. Zico Kolter and Prof. Guojun Qi. -
20th Jun 2024:
one preprint released on
Arxiv
.
Consistency Models Made Easy (Geng et al., 2024) .
We introduce a set of practical techniques for efficient training of consistency models, together with a comprehensive study on the Scaling Law of consistency models. -
24th Apr 2024:
one paper accepted by
ICML 2024
.
Variational Schrödinger Diffusion Models (Deng et al., ICML 2024) .
We introduce an efficient simulation-free Schrödinger diffusion model, with wide applications for image and time-series generation. Congratulations to Yixin and Dr. Deng. -
26th Sep 2023:
oone paper accepted by
NeurIPS 2023
.
Diff-instruct: A Universal Approach for Transferring Knowledge from Pre-trained Diffusion Models (Luo et al., NeurIPS 2023) .
Diff-Instruct is a one-step diffusion distillation approach through the lens of distribution matching, with applications on text-to-3D generation and improving GAN generators. -
26th Sep 2023:
one paper accepted by
NeurIPS 2023
.
Entropy-based Training Methods for Scalable Neural Implicit Samplers (Luo et al., NeurIPS 2023) .
We introduced two interesting training approaches for neural implicit samplers termed KL and Fisher training. -
26th Sep 2023:
one paper accepted by
NeurIPS 2023
.
SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (Xue et al., NeurIPS 2023) .
We introduced a novel diffusion sampler based on the Stochastic Adam theory, integrated for PixelArt-alpha diffusion models. -
26th Sep 2023:
one paper accepted by
NeurIPS 2023
.
Enhancing Adversarial Robustness via Score-based Optimization (Zhang et al., NeurIPS 2023) .
We introduced a novel optimization-based adversarial defense based on pre-trained diffusion models. -
9th Apr 2023:
one paper released on
Arxiv
.
A Comprehensive Survey on Knowledge Distillation of Diffusion Models (Luo, 2023) .
The first survey on diffusion distillation and knowledge transferring of diffusion models.
Friends with whom I have worked on projects:
-
J. Zico Kolter
, Professor, Director of the Machine Learning Department, Carnegie Mellon University (CMU).
-
Guo-jun Qi
, Professor, IEEE Fellow, Director of MAPLE Lab of Westlake University.
-
Guang Lin
, Associate Dean for Research, Moses Cobb Stevens Professor in Mathematics, Mech Eng
Purdue University
.
-
Kenji Kawaguchi
, Presidential Young Professor at Department of Computer Science
National University of Singapore
.
-
Zhenguo Li
, director of the AI Theory Lab in
Huawei Noah’s Ark Lab, Hongkong
, Adjunct Professor in the department of computer science and engineering,
Hong Kong University of Science and Technology
.
-
He Sun
, PhD, tenure-track Assistant Professor at
Peking University
.
-
Tianyang Hu
, PhD, Incoming Assistant Professor at the Chinese University of Hong Kong, Shenzhen (
CUHK-Shenzhen
).
-
Wenzheng Chen
, PhD, tenure-track Assistant Professor at
Peking University
.
-
Wei Deng
, PhD, Senior Research Scientist at Morgan Stanley, New York.
-
Ricky Tian Qi Chen
, PhD, Research Scientist at Meta Fundamental AI Research (FAIR), New York.
-
Zheyuan Hu, PhD from the National University of Singapore, the Winner of the NeurIPS 2024 best paper
award.
-
Seth Forsgren
, BS from Princeton, CEO and the founder of
producer.ai
, San Francisco. producer.ai was acquired by Google Deepmind.
-
Hayk Martiros
, MS from Stanford, CTO and the co-founder of
producer.ai
, technical VP of
Skydio
.
-
Debing Zhang
, PhD, Director of Artificial General Intelligence (AGI) team of
RedNote
, aka Xiaohongshu Inc.
Previous students whom I have advised, hosted, or worked with:
-
Weimin Bai
, PhD student at Peking University, ACE top-talent intern at hi-Lab, Xiaohongshu, co-advised with
Professor He Sun.
-
Yujian Chen
, undergraduate student of Computer Science, Peking University, co-advised with Professor Wenzheng Chen.
-
Jiajun Zha
, PhD student of the Hong Kong University of Science and Technology (HKUST), co-advised with Professor
Harry Liu.
-
Haoyang Zheng
, PhD student at Purdue University, co-advised with Professor Guang Lin.
-
Bowen Zheng
, PhD student at Chinese University of Hongkong (Shenzhen), co-advised with Professor Tianyang Hu.
-
Cindy Xiangrui Kong
, MS from Carnegie Mellon University (CMU), PhD student at Purdue University, co-advised with Professor
Guang Lin.
-
Junyi Wu
, PhD student at Purdue University, co-advised with Professor Guang Lin.
-
Yongzhao Chao
, MS student at Chinese University of Hongkong (Shenzhen), co-advised with Professor Tianyang Hu.
-
Yubo Li
, Undergraduate. student at Tsinghua University, and incoming PhD student at Peking University,
co-advised by Professor He Sun.
-
Yihong Luo
, PhD student at HKUST.
-
Harry Liu
, PhD student at Purdue University, co-advised with Professor Guang Lin.
-
Yifei Wang
, Undergraduate student at Peking University, and incoming PhD student at Rice University (Houston).
Co-advised with Professor He Sun.
-
Kaihang Pan
, PhD student at Zhejiang University. The winner of CPVR2025 Best Student Paper Honorable Mention.
-
Le Zhuo
, Incoming CS PhD student of
MMLab
at the Chinese University of Hong Kong.
-
Zemin Huang
, CS PhD student of the joint PhD Program of Zhejiang University and Westlake University, co-advised
with Professor Guo-jun Qi.
-
Tiancheng Li
, CS PhD student of the joint PhD Program of Zhejiang University and Westlake University, co-advised
with Professor Guo-jun Qi.
-
Zilyu Ye
, CS undergraduate student of South China University of Technology, ByteDance Top Seed Intern,
co-advised with Professor Guo-jun Qi.
-
Yuxuan Gu
, Incoming M.S. student at Peking University, co-advised with Professor He Sun.
-
Chaowei Liu
, National University of Singapore (NUS), co-advised with Professor Guo-jun Qi.