Weijian Luo, PhD

"What important truth do very few people agree with you on?"
— Peter Thiel

Weijian is a RedStar Senior research scientist of Humane Intelligence (hi) lab of Xiaohongshu (RedNote) Inc , Beijing. Currently, he leads the research team of large generative understanding foundation models in hi-lab . Weijian obtained his Doctoral Degree in Statistics and Generative Modeling from the School of Mathematical Sciences, Peking University . He received his M.S. Degree in Applied Statistics from the School of Mathematical Sciences also from Peking University , and his B.S. degree in Mathematics from University of Science and Technology of China (USTC) .

Research Interests: Weijian's early work had set the theory and practices for modern one-step text-to-image generative models. Currently, Weijian leads the research team of large generative understanding models in hi-lab . His team focuses on developing cutting-edge, efficient generative understanding models that can reason, understand humane intentions, and generate vision-audio responses in a real-time manner. Weijian also leads the research direction of next-generation generative models, including one-step text-to-image and video models at scale.

Call for Talents: Weijian's team in Beijing is actively hiring talented research scientists and engineers. The team encourages candidates with strong track records and unparalleled curiosity about next-gen generative understanding models to apply for the RedStar Research Scientist program , as well as the ACE intern program .

Academic Services: Weijian is invited as a reviewer for academic journals including Nature Communications (NC) , Journal of Machine Learning Research (JMLR) , IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) ,IEEE Transactions on Image Processing (TIP) , IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , and Pattern Recognition (PR) . He also reviews for top AI Conferences including NeurIPS, ICML, ICLR, CVPR, ICCV, AISTATS, UAI, ACM-MM , etc;

Contact: pkulwj1994 at icloud dot com

Selected Talks:

News:

  • 6th May 2026: one paper is public on Arxiv.
    Autoregressive Visual Generation Needs a Prologue (Zheng et al., 2026) .
    A new paradigm of Auto-regressive (AR) visual generation model.
  • 6th May 2026: one paper is public on Arxiv.
    Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation (Zheng et al., 2026) .
    The first research on the Entropy Cliff phenomenon of the Auto-regressive (AR) visual generation model.
  • 30th April 2026: one paper is accepted by ICML 2026, Seoul, South Korea.
    TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward (Luo et al., 2026) .
    The first and the best Reinforcement Learning methods for few-step generative models using non-differentiable rewards. With a 6B 4-step SoTA model.
  • 30th April 2026: one paper is accepted by ICML 2026, Seoul, South Korea.
    Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation (Bai et al., 2025) .
    We introduce important techniques that incorporate VLM as rewards for both SDS-based 3D generation and feedforward 3D models.
  • 19th March 2026: one paper is public on Arxiv.
    Multimodal OCR: Parse Anything from Documents (Zheng et al., 2026) .
    State-of-the Art document parsing Vision Language Model.
  • 15th March 2026: one paper is accepted by TPAMI
    One-step Diffusion and Flow Distillation through Implicit Generator Matching (Huang et al., 2026) .
    Theory and practices of one-step diffusion generator matching.
  • 7th March 2026: one paper is public on Arxiv
    TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward (Luo et al., 2026) .
    The first and the best Reinforcement Learning methods for few-step generative models using non-differentiable rewards. With a 6B 4-step SoTA model.
  • 12th Feburary 2026: one paper is public on Arxiv
    ZeroDiff++: Substantial Unseen Visual-semantic Correlation in Zero-shot Learning (Ye et al., 2026) .
    Better Zero Shot Learning (ZSL) powered by generative diffusion.
  • 25th Feburary 2026: one paper is accepted by CVPR 2026
    Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning (Gu et al., 2025) .
    Principles for acceleration && RL of Masked Auto-regressive Models.
  • 2nd Feburary 2026: one paper is public on Arxiv
    Ultra Fast PDE Solving via Physics Guided Few-step Diffusion (Cindy et al., 2026) .
    Solving Partial Differential Equation (PDE) within a second.
  • 25th January 2026: one paper is accepted by ICLR 2026
    Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct (Zheng et al., 2025) .
    Strong few-step language models that outperform GPT2 (1024 NFE) and dLLMs (1024 NFE) with only 16 NFEs, at the same model size.
  • 20th December, 2025: one paper is accepted by Transactions on Machine Learning Research (TMLR) .
    Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching (Bai et al., 2025) Dive3D introduces Score-implicit Matching techniques to text-to-3D generation, which significantly improves generative diversity as well as quality.
  • 20st December 2025: one paper is public on Arxiv
    Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning (Gu et al., 2025) .
    We introduce principal methods for distillation and reinforcement learning of Masked Auto-regressive Models.
  • 20st December 2025: one paper is public on Arxiv
    Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation (Bai et al., 2025) .
    We introduce important techniques that incorporate VLM as rewards for both SDS-based 3D generation and feedforward 3D models.
  • 1st October 2025: one paper is public on Arxiv
    Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct (Zheng et al., 2025) .
    We introduce DiDi-Instruct, a very strong few-step language model that outperforms GPT2 (1024 NFE) and dLLMs (1024 NFE) with only 16 NFEs, at the same model size.
  • 18th September 2025: one paper is accepted by NeurIPS 2025 @ San Diego and Mexican City.
    Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation (Luo et al., 2025) .
    We present a novel finding: reward maximization with proper regularizations can effectively train large-scale few-step text-to-image generative models.
  • 18th September, 2025: one paper is accepted by NeurIPS 2025 @ San Diego and Mexican City.
    Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction (Wang et al., 2025) Uni-Instruct unifies over 10 existing one-step diffusion distillation in theory, with an absolute SoTA one-step FID of 1.02 on ImageNet64 generation benchmark.
  • 12th September, 2025: one paper is accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence ( TPAMI ).
    Self-Guidance: Boosting Flow and Diffusion Generation on Their Own (Li et al., 2024) .
    Congratulations to my mentee student, Tiancheng, for getting a TPAMI acceptance in his first Ph.D. year.
  • 26th Agust, 2025: Introducing the dots.VLM1 by hi-lab: a large and versatile vision-language model built upon DeepseekV3 LLM architecture and an internal 1.2B MoE Vision Encoder. We train the VLM and VE from scratch, resulting in a model on par with leading VLMs on some metrics. Arxiv .
    Technical report of the dots.vlm1. (hi-lab multimodal team)
  • 16th June, 2025: one preprint paper is public on Arxiv .
    Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching (Bai et al., 2025) Dive3D introduces Score-implicit Matching techniques to text-to-3D generation, which significantly improves generative diversity as well as quality.
  • 25th May, 2025: one preprint paper is public on Arxiv .
    Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction (Wang et al., 2025) Uni-Instruct unifies over 10 existing one-step diffusion distillation in theory, with an absolute SoTA one-step FID of 1.02 on ImageNet64 generation benchmark.
  • 4th May, 2025: one paper accepted by International Conference of Machine Learning (ICML) 2025 .
    Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models (Luo et al., 2024) We introduced a novel score-based PPO algorithm for RL fine-tuning of 1-step text-to-image generative models. Our open-sourced 0.6B DIstar-SDXL-1step model outperforms the 12B FLUX-dev diffusion model in human preference scores.
  • 19th March 2025: one preprint paper is public on Arxiv .
    Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation (Luo et al., 2025) .
    We present a novel finding: reward maximization with proper regularizations can effectively train large-scale few-step text-to-image generative models.
  • 27th February 2025: one paper accepted by CVPR 2025 .
    Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation (Ye et al., 2024) .
    We explore a novel attempt to use reinforcement learning for training diffusion models, with a very strong diffusion model with adaptive generation steps.
  • 23th January 2025: one paper accepted by ICLR 2025 .
    Consistency Models Made Easy .
    We introduce a set of practical techniques for efficient training of consistency models, together with a comprehensive study on the Scaling Law of consistency models.
  • 5th Dec 2024: One pre-print is public on Arxiv .
    Self-Guidance: Boosting Flow and Diffusion Generation on Their Own (Li et al., 2024) .
    Self-guidance can improve human hands and bodies of images generated by diffusion or flow models.
  • 1st Dec 2024: One pre-print is public on Arxiv .
    Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation (Ye et al., 2024) .
    We introduced an approach for training variable-time-schedule diffusion models using reinforcement learning.
  • 21st Nov 2024: One single-author paper accepted by Transactions on Machine Learning Research ( TMLR ) .
    Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences (Luo, 2024) .
    Diff-Instruct++ is the first work on preference alignment of one-step text-to-image generative models, opening the preference alignment with the distillation of diffusion and flow models.
  • 12th Nov 2024: Delivered an invited talk at the Google Deepmind Diffusion Reading Group titiled A Path to Human-preferred One-step Text-to-image Generative Models . Check the [Slides] here.
  • 30th Oct 2024: Be invited to give an (internal) online academic talk in the Google Deepmind research team on 12th Nov. The talk title is One-step Text-to-image Generative Models: from Diffusion Distillation to Human-preference Alignment . In this talk, I will share some exciting progress in improving human preferences for one-step and few-step text-to-image generative models through the lens of Reinforcement Learning using Human Feedback (RLHF). Readers can refer to Diff-Insruct++ and Diff-Insruct* for technical details.
  • 25th Oct 2024: An invited talk delivered at the Biomedical Engineering lab led by Dr. Sun at Peking University, Beijing, China. The talk is on Recent Progresses on Diffusion Distillation .
  • 20th Oct 2024: Had an academic visit to MAPLE lab led by Dr. Qi in Westlake University, Hangzhou, China. Delivered a talk on Efficient Generative Models to lab members.
  • 18th Oct 2024: one reprint released on Arxiv .
    One-step Flow Matching Generators (Huang et al., 2024) .
    We introduce a novel method to distill the flow-matching-based Stable Diffusion 3 model into strong one-step generators.
  • 18th Oct 2024: one reprint released on Arxiv .
    Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models (Luo et al., 2024) .
    This paper introduces the Diff-Instruct* , a novel approach to train human-preferred large-scale one-step text-to-image generative models through the lens of online RLHF with general score-based constraints. The resulting one-step 0.6B DiT-DI* model achieves a SoTA HPSv2.0 score of 28.70 .
  • 17th Oct 2024: one reprint released on Arxiv .
    Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences (Luo, 2024) .
    This paper introduces the Diff-Instruct++ , the first attempt at human preference alignment of large-scale one-step text-to-image generative models. The aligned one-step 0.6B DiT-DI++ model achieves a leading HPSv2.0 score of 28.48 .
  • 14th Oct 2024: I defended my PhD Thesis in 14th Oct in Peking University . I feel humbled and grateful to be loved and helped by great advisors, family, and awesome friends.
  • 26th Sep 2024: one paper accepted by NeurIPS 2024 .
    One-step Diffusion Distillation Through Score Implicit Matching (Luo et al., NeurIPS 2024) .
    We introduce the score implicit matching, a novel one-step diffusion distillation approach with an amazing one-step text-to-image generative model. Appreciation to Prof. Zico Kolter and Prof. Guojun Qi.
  • 20th Jun 2024: one preprint released on Arxiv .
    Consistency Models Made Easy (Geng et al., 2024) .
    We introduce a set of practical techniques for efficient training of consistency models, together with a comprehensive study on the Scaling Law of consistency models.
  • 24th Apr 2024: one paper accepted by ICML 2024 .
    Variational Schrödinger Diffusion Models (Deng et al., ICML 2024) .
    We introduce an efficient simulation-free Schrödinger diffusion model, with wide applications for image and time-series generation. Congratulations to Yixin and Dr. Deng.
  • 26th Sep 2023: oone paper accepted by NeurIPS 2023 .
    Diff-instruct: A Universal Approach for Transferring Knowledge from Pre-trained Diffusion Models (Luo et al., NeurIPS 2023) .
    Diff-Instruct is a one-step diffusion distillation approach through the lens of distribution matching, with applications on text-to-3D generation and improving GAN generators.
  • 26th Sep 2023: one paper accepted by NeurIPS 2023 .
    Entropy-based Training Methods for Scalable Neural Implicit Samplers (Luo et al., NeurIPS 2023) .
    We introduced two interesting training approaches for neural implicit samplers termed KL and Fisher training.
  • 26th Sep 2023: one paper accepted by NeurIPS 2023 .
    SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (Xue et al., NeurIPS 2023) .
    We introduced a novel diffusion sampler based on the Stochastic Adam theory, integrated for PixelArt-alpha diffusion models.
  • 26th Sep 2023: one paper accepted by NeurIPS 2023 .
    Enhancing Adversarial Robustness via Score-based Optimization (Zhang et al., NeurIPS 2023) .
    We introduced a novel optimization-based adversarial defense based on pre-trained diffusion models.
  • 9th Apr 2023: one paper released on Arxiv .
    A Comprehensive Survey on Knowledge Distillation of Diffusion Models (Luo, 2023) .
    The first survey on diffusion distillation and knowledge transferring of diffusion models.

Friends with whom I have worked on projects:

Previous students whom I have advised, hosted, or worked with:

  • Weimin Bai , PhD student at Peking University, ACE top-talent intern at hi-Lab, Xiaohongshu, co-advised with Professor He Sun.
  • Yujian Chen , undergraduate student of Computer Science, Peking University, co-advised with Professor Wenzheng Chen.
  • Jiajun Zha , PhD student of the Hong Kong University of Science and Technology (HKUST), co-advised with Professor Harry Liu.
  • Haoyang Zheng , PhD student at Purdue University, co-advised with Professor Guang Lin.
  • Bowen Zheng , PhD student at Chinese University of Hongkong (Shenzhen), co-advised with Professor Tianyang Hu.
  • Cindy Xiangrui Kong , MS from Carnegie Mellon University (CMU), PhD student at Purdue University, co-advised with Professor Guang Lin.
  • Junyi Wu , PhD student at Purdue University, co-advised with Professor Guang Lin.
  • Yongzhao Chao , MS student at Chinese University of Hongkong (Shenzhen), co-advised with Professor Tianyang Hu.
  • Yubo Li , Undergraduate. student at Tsinghua University, and incoming PhD student at Peking University, co-advised by Professor He Sun.
  • Yihong Luo , PhD student at HKUST.
  • Harry Liu , PhD student at Purdue University, co-advised with Professor Guang Lin.
  • Yifei Wang , Undergraduate student at Peking University, and incoming PhD student at Rice University (Houston). Co-advised with Professor He Sun.
  • Kaihang Pan , PhD student at Zhejiang University. The winner of CPVR2025 Best Student Paper Honorable Mention.
  • Le Zhuo , Incoming CS PhD student of MMLab at the Chinese University of Hong Kong.
  • Zemin Huang , CS PhD student of the joint PhD Program of Zhejiang University and Westlake University, co-advised with Professor Guo-jun Qi.
  • Tiancheng Li , CS PhD student of the joint PhD Program of Zhejiang University and Westlake University, co-advised with Professor Guo-jun Qi.
  • Zilyu Ye , CS undergraduate student of South China University of Technology, ByteDance Top Seed Intern, co-advised with Professor Guo-jun Qi.
  • Yuxuan Gu , Incoming M.S. student at Peking University, co-advised with Professor He Sun.
  • Chaowei Liu , National University of Singapore (NUS), co-advised with Professor Guo-jun Qi.