Faculty & Researchers


Dr. Conghui He

Conghui He (何聪辉)

Young Scientist and Project Investigator (PI) at Shanghai Artificial Intelligence Laboratory

Previously Senior Researcher at WeChat (developed Plato).

Ph.D. from Tsinghua University (2013-2018), B.S. from Sun Yat-sen University (2009-2013).

Email: heconghui@pjlab.org.cn

Research Interests

High-Performance Computing, Computer Vision, Large Language Models, Data-Centric AI, Pre-training Data Preparation, Multimodal Learning

Awards

  • 2025 ACL Best Theme Paper Award
  • 2025 U35 Shanghai Technology Youth 35 Leading Program
  • 2025 WAIC Yunfan Award 'Brilliant Star'
  • 2023 SenseTime Award (Top 1 team out of 100, Highest Award)
  • 2021 SenseTime Outstanding Team Award (Top 10 teams out of 200)
  • 2019 Tencent Technology Breakthrough Award - Gold (Top 1 team out of 50, Highest Technical Award)
  • 2018 Outstanding Doctoral Graduate Award, Tsinghua University
  • 2017 ACM Gordon Bell Prize (Highest honor in HPC applications)
  • 2013 IEEE-IBM Smarter Planet Challenge Global Winner (Team Leader, 1/54)

Key Research, Projects & Reports

Selected Publications

  • Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching, CVPR 2025
  • OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations, CVPR 2025
  • GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training, ICLR 2025
  • OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text, ICLR 2025
  • Mmbench: Is your multi-modal model an all-around player? ECCV 2024
  • Sharegpt4v: Improving large multi-modal models with better captions, ECCV 2024
  • Internlm2 technical report, arxiv 2024
  • 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios, SC 2017
Dr. Jun Li

Lijun Wu (吴郦军)

Young Scientist, Shanghai Artificial Intelligence Laboratory.

Formerly Research Scientist at ByteDance, Senior Researcher at Microsoft Research Asia (MSRA).

Email: lijun_wu@outlook.com

Research Interests

LLM (post-training, RLHF), Synthetic Data Optimization, AI4Science (LLM4Science, Drug Discovery)

Awards

  • 2025 Selected for Shanghai Magnolia Plan
  • 2025 NeurIPS CURE-Bench Internal Reasoning Track - 2nd place
  • 2024 ACL Language+Molecule - 1st and 2nd place in two tracks
  • 2021 OGB-LSC@KDD Cup - Runner-up
  • 2019 WMT Global Machine Translation Competition - 8 Track Championships
  • 2018 MSRA Ph.D. Fellowship
  • 2013 IEEE-IBM Smarter Planet Challenge Global Winner

Key Research, Projects & Reports

Selected Publications

  • OpenDataArena: A Fair and Open Arena for Benchmarking Post-training Dataset Value, arxiv 2025
  • Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates, KDD 2026
  • Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning, NeurIPS 2025
  • Nature Language Model: Deciphering the Language of Nature for Scientific Discovery, arxiv 2025
  • 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization, ICLR 2025
  • Fabind+: Enhancing molecular docking through improved pocket prediction and pose generation, KDD 2025
  • Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey, arxiv 2024
  • Target-aware Molecule Generation for Drug Design Using a Chemical Language Model, Nature Communications, 2024
  • Randomness Regularization with Simple Consistency Training for Neural Networks, TPAMI 2024
  • A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, TPAMI 2024
  • BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning, ACL 2024 findings
  • The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4, arxiv 2023
  • BioT5: Enriching Cross-model Integration in Biology with Chemical Knowledge and Natural Language Associations, EMNLP 2023
  • FABind: Fast and Accurate Protein-Ligand Binding, NeurIPS 2023
  • Unified 2D and 3D Pre-Training of Molecular Representations, KDD 2022
  • R-Drop: Regularized Dropout for Neural Networks, NeurIPS 2021
  • Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection, NeurIPS 2020
  • Incorporating BERT into Neural Machine Translation, ICLR 2020
  • Exploiting Monolingual Data at Scale for Neural Machine Translation, EMNLP 2019
  • A Study of Reinforcement Learning for Neural Machine Translation, EMNLP 2018
Dr. Bin Wang

Bin Wang (王斌)

Young Scientist, Shanghai Artificial Intelligence Laboratory.

Ph.D. from University of Chinese Academy of Sciences (UCAS Scholar).

Algorithm Lead for MinerU project.

Email: wangbin@pjlab.org.cn

Research Interests

Intelligent Document Parsing and Understanding, Data Autonomous Iteration Agents, Multimodal Large Models

Awards

  • 2025 Selected for Shanghai Dongfang Talent Top Project
  • 2020 Chinese Academy of Sciences Li Yuehua Excellent Doctoral Student (Top 5%)
  • 2016 ILSVRC (ImageNet) 2016 VID Video Object Detection Task - 3rd Place Globally

Key Research, Projects & Reports

Research Focus Areas

  • Intelligent Document Parsing & Understanding: Developing practical algorithms for layout detection, table recognition, chemical element recognition, geometric parsing, etc., for RAG and AI4S.
  • Multimodal Large Models: Focusing on vertical domain multimodal large models using data-centric algorithms, generative models, and reinforcement learning to address OOD problems.
  • Data Autonomous Iteration Agents: Using agent technology to automate data iteration processes (quality improvement, distribution balancing, safety validation) for efficient AI model training.

Selected Publications

  • MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing, Arxiv 2025
  • OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation, ICCV 2025
  • Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching, CVPR 2025
  • OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations, CVPR 2025
  • GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training, ICLR 2025
  • OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text, ICLR 2025
  • MinerU: An Open-Source Solution for Precise Document Content Extraction, Arxiv 2024
  • InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD, NeurIPS 2024
  • Parrot Captions Teach CLIP to Spot Text, ECCV 2024
  • VIGC: Visual Instruction Generation and Correction, AAAI 2024
Dr. Jiang Wu

Jiang Wu (吴江)

Young Scientist, Shanghai Artificial Intelligence Laboratory.

B.S. and Ph.D. from Tsinghua University.

Email: wujiang@pjlab.org.cn

Research Interests

Large Language Models, Multimodal Large Models, Intelligent Document Parsing and Understanding

Awards

  • Led the development of an industry-leading satellite imagery analysis system, setting new technical benchmarks, and deployed it in multiple satellite and surveying centers.

Key Research, Projects & Reports

Selected Publications

  • Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations, ACL 2024
  • VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis, AAAI 2025
  • Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning, AAAI 2025
  • GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation, NAACL 2025 findings
  • OpenHuEval: Evaluating Large Language Model on Hungarian Specifics, ACL 2025 findings
  • PM4Bench: Benchmarking Large Vision-Language Models with Parallel Multilingual Multi-Modal Multi-task Corpus, arxiv 2025
  • GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition, arxiv 2025
  • RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning, arxiv 2025
Dr. Jiantao Qiu

Jiantao Qiu (邱剑涛)

Young Researcher, Shanghai Artificial Intelligence Laboratory.

B.S. and Ph.D. in Electronic Engineering from Tsinghua University.

Email: qiujiantao@pjlab.org.cn

Research Interests

Large Language Model Datasets, HTML Document Understanding, Energy-Efficient Neural Network Accelerator Design, Multi-Machine Collaborative System Design

Awards

  • 2022 AI2000 Global AI Scholar, Third Place in FPGA Rising Star List

Key Research, Projects & Reports

  • Language Model-based Parsing Dataset: AICC: Parse HTML Finer, Make Models Better--A 7.3 T AI-Ready Corpus Built by a Model-Based HTML Parser
  • Language Model-based Web Parsing Tool: Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM
  • Large-scale Pre-training Dataset: Wanjuan-cc: A safe and high-quality open-sourced english webtext dataset

Selected Publications

  • Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification, NeurIPS 2025
  • Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning, NeurIPS 2025
  • Multi-agent collaborative data selection for efficient llm pretraining, ACL 2025
  • Modeling and optimization study on sulfamethoxazole degradation by electrochemically activated persulfate process, JCP 2018
  • Angel-eye: A complete design flow for mapping CNN onto embedded FPGA, IEEE TCAD 2017
  • Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, FPGA 2016
Dr. Wentao Zhang

Wentao Zhang (张文涛)

Assistant Professor, Researcher, and Doctoral Supervisor at the International Machine Learning Research Center, Peking University.
Research Consultant at Shanghai Artificial Intelligence Laboratory.

Formerly at Tencent Machine Learning Platform Department, Apple AIML, and Mila - Quebec AI Institute.

Email: zhangwentao1@pjlab.org.cn

Research Interests

Data-centric machine learning and large model data governance.

Awards

  • WWW'22 Best Student Paper Award (1/1822)
  • AP-Web'23 Best Paper Runner Up Award
  • CIKM'24 Best Student Full Paper Award (1/1496)
  • 2021 Apple Scholar (sole recipient in Asia-Pacific)
  • WAIC Yunfan Award (1 of 15 globally)
  • 2023 Chinese Institute of Electronics Science and Technology Progress Award (First Prize)
  • 2023 Peking University/Beijing Municipal/Chinese Association for Artificial Intelligence Excellent Doctoral Dissertation Award
  • 2024 Peking University "Weiming Young Scholar"
  • 2024 World Internet Conference Leading Scientific and Technological Achievement Award
  • 2024 Huawei Spark Award
  • 2025 Beijing High-Level Innovation and Entrepreneurship Talent Support Program - Youth Talent
  • 2025 ACM SIGMOD China Rising Star Award

Key Research, Projects & Reports

  • DataFlow: LLM data preparation system, including data acquisition, processing, and quality assessment; Technical Report; Video Tutorial; Text Tutorial
  • DataFlow-MM: Extends DataFlow capabilities to multimodal scenarios including audio, images, and videos; Text Tutorial
  • DataFlow-Table: Extends DataFlow capabilities to structured table data, including intelligent data retrieval, processing, and analysis (coming soon)
  • DataFlow-KG: Extends DataFlow capabilities to graph-structured data, such as KG extraction and KG-based SFT data synthesis (coming soon)
  • DataFlow Agent: Lowers the barrier to using DataFlow, automates operator and pipeline generation and optimization
  • DataFlex: Data-centric LLM training framework, dynamically selects and configures data during training; Text Tutorial
  • MinerU: Parses PDFs into Markdown/Json format data that LLMs can use
  • SceneFlow: Extends DataFlow capabilities to World Model scenarios (coming soon)
  • AgentFlow: First Agent data synthesis framework including RAG, MM-RAG, DeepResearch, Code, GUI, and other environments (coming soon)
  • Paper2Any: Application built on DataFlow-Agent, including scientific drawing and PPT generation; Online Experience

Selected Publications

  • DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI, Arxiv 2025
  • MaintainCoder: Maintainable Code Generation Under Dynamic Requirements, NeurIPS 2025
  • Multi-step Visual Reasoning with Visual Tokens Scaling and Verification, NeurIPS 2025
  • Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning, ICLR 2025
  • Towards Precise Scaling Laws for Video Diffusion Transformers, CVPR 2025
  • Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, NeurIPS 2024
  • Physics-guided Active Sample Reweighting for Urban Flow Prediction, CIKM 2024 Best Student Full Paper
  • PaSca: a Graph Neural Architecture Search System under the Scalable Paradigm, WWW 2022 Best Student Paper
Dr. Weijia Li

Weijia Li (李唯嘉)

Associate Professor and Doctoral Supervisor, Tsinghua Shenzhen International Graduate School.
Research Consultant, Shanghai Artificial Intelligence Laboratory.

B.S. from Sun Yat-sen University, Ph.D. from Tsinghua University, Postdoc at MMLab, CUHK.

Email: liweijia@pjlab.org.cn

Research Interests

Multimodal Large Models, Generative Models, Agent Systems, Remote Sensing and Urban Applications

Key Research, Projects & Reports

Selected Publications

Multimodal Large Models:

  • LEGION: Learning to Ground and Explain for Synthetic Image Detection, ICCV 2025 Highlight
  • Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation, NeurIPS 2025
  • BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception, NeurIPS 2025
  • LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models, ICLR 2025 Spotlight

Generative Models:

  • RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards, Arxiv 2025
  • Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation, Arxiv 2025
  • Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation, Arxiv 2025
  • SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm, ICCV 2025

Remote Sensing and Urban Applications:

  • Earth-agent: Unlocking the full landscape of earth observation with agents, Arxiv 2025
  • Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenarios, AAAI 2025
  • Where am I? Cross-View Geo-localization with Natural Language Descriptions, ICCV 2025

Joint Training Students

Hengrui Kang (康恒锐)

Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory

Undergraduate university: University of Electronic Science and Technology of China (UESTC)

Grade: 2st-year Ph.D. student (Starting Sep 2024)

Research Interests: Synthetic data detection, intelligent document parsing and generation

Jiahe Song (宋家和)

Joint Ph.D. Program: School of AI, Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory

Undergraduate university: Peking University

Grade: 1st-year Ph.D. student (Starting Sep 2025)

Research Interests: Multimodal large models, AI for science

Honglin Lin (林泓霖)

Joint Ph.D. Program: School of AI, Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory

Undergraduate university: Beijing University of Posts and Telecommunications (BUPT)

Grade: 1st-year Ph.D. student (Starting Sep 2025)

Research Interests: Mathematical reasoning in large models, data synthesis, etc.

Junbo Niu (牛俊博)

Joint Ph.D. Program: Peking University (PKU) & Shanghai AI Laboratory

Undergraduate university: Beihang University

Grade: 1st-year Ph.D. student (Starting Sep 2025)

Research Interests: Multimodal Understanding (Video Understanding, OCR) & Data-Centric Machine Learning

Xin Gao (高鑫)

Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory

Undergraduate university: University of Electronic Science and Technology of China (UESTC)

Grade: 1st-year Ph.D. student (Starting Sep 2025)

Research Interests: Data synthesis, evaluation, and filtering for large models, etc.

Yu Li (李宇)

Joint Ph.D. Program: University of Science and Technology of China (USTC) & Shanghai AI Laboratory

Undergraduate university: Wuhan University

Grade: 1st-year Ph.D. student (Starting Sep 2025)

Research Interests: Logical reasoning in large models, data synthesis, etc.

Zichen Wen (温子辰)

Joint Ph.D. Program: Shanghai AI Laboratory & Shanghai Jiao Tong University (SJTU)

Undergraduate university: University of Electronic Science and Technology of China (UESTC)

Grade: 1st-year Ph.D. student (Starting Sep 2025)

Research Interests: Efficient AI (including Lightweight and Efficient Large Models for Language/Multimodality, and Data-Efficient Artificial Intelligence)

Zhanping Zhong (钟展平)

Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory

Undergraduate university: Beihang University

Grade: 1st-year Ph.D. student (Starting Sep 2025)

Research Interests: LLM agent, data synthesis, data selection, etc.

Xiaoran Shang (尚萧然)

Joint Ph.D. Program: University of Science and Technology of China (USTC) & Shanghai AI Laboratory

Undergraduate university: Wuhan University

Grade: 0th-year Ph.D. student (Starting Sep 2026)

Research Interests: Multimodal Large Models, data synthesis, data selection, etc.

Shoupeng Wang (王首鹏)

Joint Ph.D. Program: University of Science and Technology of China (USTC) & Shanghai AI Laboratory

Undergraduate university: Wuhan University

Grade: 0th-year Ph.D. student (Starting Sep 2026)

Hejun Dong (董和军)

Joint Ph.D. Program: The Chinese University of Hong Kong & Shanghai AI Laboratory

Undergraduate university: Beihang University

Grade: 0th-year Ph.D. student (Starting Sep 2026)

Jie Yang (杨杰)

Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory

Undergraduate university: Wuhan University

Grade: 0th-year Ph.D. student (Starting Sep 2026)

Chuang Wang (王闯)

Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory

Undergraduate university: Beihang University

Grade: 0th-year Ph.D. student (Starting Sep 2026)

Research Interests: Multimodal large models, AI for science

Profile: https://chuangwang123.github.io/

Jutao Xiao (肖举涛)

Joint Ph.D. Program: Zhejiang University (ZJU) & Shanghai AI Laboratory

Undergraduate university: Northeastern University

Grade: 0th-year Ph.D. student (Starting Sep 2026)

Wei Li (李薇)

Joint Ph.D. Program: East China Normal University (ECNU) & Shanghai AI Laboratory

Master university: Shanghai Jiao Tong University

Grade: 1th-year Ph.D. student (Starting Sep 2025)