Faculty & Researchers
Conghui He (何聪辉)
Young Scientist and Project Investigator (PI) at Shanghai Artificial Intelligence Laboratory
Previously Senior Researcher at WeChat (developed Plato).
Ph.D. from Tsinghua University (2013-2018), B.S. from Sun Yat-sen University (2009-2013).
Email: heconghui@pjlab.org.cn
Research Interests
High-Performance Computing, Computer Vision, Large Language Models, Data-Centric AI, Pre-training Data Preparation, Multimodal Learning
Awards
- 2023 SenseTime Award (Top 1 team out of 100)
- 2021 SenseTime Outstanding Team Award (Top 10 teams out of 200)
- 2019 Tencent Technology Breakthrough Award - Gold (Top 1 team out of 50)
- 2018 Outstanding Doctoral Graduate Award
- 2017 ACM Gordon Bell Prize (Highest honor in HPC applications)
- 2013 IEEE-IBM Smarter Planet Challenge Global Winner (Team Leader, 1/54)
Key Research, Projects & Reports
- OpenDataLab: An open platform with 7700+ datasets, serving 40k+ developers.
- MinerU: A one-stop, open-source, high-quality data extraction tool for PDF, web, and e-books.
- InternLM: Series of 7B and 20B foundation and chat models.
- PDF-Extract-Kit: A comprehensive library for high-quality PDF content extraction.
- Report: "对话上海AI Lab何聪辉:从DeepSeek看数据的重要性,低成本实现"四两拨千斤""
- Report: "中国超算再获"戈登贝尔奖",成果对地震预测研究有借鉴意义"
Selected Publications
- Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching, CVPR 2025
- OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations, CVPR 2025
- GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training, ICLR 2025
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text, ICLR 2025
- Mmbench: Is your multi-modal model an all-around player? ECCV 2024
- Sharegpt4v: Improving large multi-modal models with better captions, ECCV 2024
- 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios, SC 2017
Lijun Wu (吴郦军)
Young Scientist, Shanghai Artificial Intelligence Laboratory.
Formerly Research Scientist at ByteDance, Senior Researcher at Microsoft Research Asia (MSRA).
Email: lijun_wu@outlook.com
Research Interests
LLM (post-training, RLHF), Synthetic Data Optimization, AI4Science (LLM4Science, Drug Discovery)
Awards
- 2013 IEEE-IBM Smarter Planet Challenge Global Winner
- 2018 MSRA Ph.D. Fellowship
- 2019 WMT Global Machine Translation Competition - 8 Track Championships
- 2021 OGB-LSC@KDD Cup - Runner up
- 2024 ACL Language + Molecule - 1st and 2nd place in two tracks
Key Research, Projects & Reports
- Report: "2018全球首个人工媲美的中英机器翻译系统"
- Report: "A study of reinforcement learning for neural machine translation"
- Report: "WMT 2019国际机器翻译大赛:微软亚洲研究院以8项第一成为冠军"
- Report: "R-Drop: A simple and effective regular method to correct the defects of Dropout"
- Report: "非自回归生成研究最新综述,近200篇文献揭示挑战和未来方向"
- Report: "230页长文,涵盖5大科学领域,微软团队使用GPT-4探索LLM对科学发现的影响"
- Report: "加速药物发现:基于生成式AI的靶点感知分子生成器TamGen"
- Report: "NatureLM:驱动科学发现与创新的跨领域AI大模型"
Selected Publications
- Nature Language Model: Deciphering the Language of Nature for Scientific Discovery, arxiv 2025
- 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization, ICLR 2025
- Fabind+: Enhancing molecular docking through improved pocket prediction and pose generation, KDD 2025
- Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey, arxiv 2024
- Target-aware Molecule Generation for Drug Design Using a Chemical Language Model, Nature Communications, 2024
- Randomness Regularization with Simple Consistency Training for Neural Networks, TPAMI, 2024
- A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, TPAMI 2024
- BioT5: Enriching Cross-model Integration in Biology with Chemical Knowledge and Natural Language Associations, EMNLP 2023
- FABind: Fast and Accurate Protein-Ligand Binding, NeurIPS 2023
- Unified 2D and 3D Pre-Training of Molecular Representations, KDD 2022
- R-Drop: Regularized Dropout for Neural Networks, NeurIPS 2021
- Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection, NeurIPS 2020
- Incorporating BERT into Neural Machine Translation, ICLR 2020
- Exploiting Monolingual Data at Scale for Neural Machine Translation, EMNLP 2019
- A Study of Reinforcement Learning for Neural Machine Translation, EMNLP 2018
Bin Wang (王斌)
Young Scientist, Shanghai Artificial Intelligence Laboratory.
Ph.D. from University of Chinese Academy of Sciences (UCAS Scholar).
Algorithm Lead for MinerU project.
Email: wangbin@pjlab.org.cn
Research Interests
Intelligent Document Parsing and Understanding, Data Autonomous Iteration Agents, Multimodal Large Models
Awards
- ImageNet Large Scale Visual Recognition Challenge (ILSVRC2016 VID) - 3rd Place Globally
- UCAS Zhu Li Yuehua Excellent Doctoral Scholarship
Key Research & Projects
- MinerU: https://github.com/opendatalab/MinerU
- PDF-Extract-Kit: https://github.com/opendatalab/PDF-Extract-Kit
- DocLayout-YOLO: https://github.com/opendatalab/DocLayout-YOLO
- OmniDocBench: https://github.com/opendatalab/OmniDocBench
Research Focus Areas
- Intelligent Document Parsing & Understanding: Developing practical algorithms for layout detection, table recognition, chemical element recognition, geometric parsing, etc., for RAG and AI4S.
- Multimodal Large Models: Focusing on vertical domain multimodal large models using data-centric algorithms, generative models, and reinforcement learning to address OOD problems.
- Data Autonomous Iteration Agents: Using agent technology to automate data iteration processes (quality improvement, distribution balancing, safety validation) for efficient AI model training.
Selected Publications
- Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching, CVPR 2025
- OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations, CVPR 2025
- GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training, ICLR 2025
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text, ICLR 2025
- MinerU: An Open-Source Solution for Precise Document Content Extraction, Arxiv 2024
- InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD, NeurIPS 2024
- Parrot Captions Teach CLIP to Spot Text, ECCV 2024
- VIGC: Visual Instruction Generation and Correction, AAAI 2024
Jiang Wu (吴江)
Young Scientist, Shanghai Artificial Intelligence Laboratory.
B.S. and Ph.D. from Tsinghua University.
Email: wujiang@pjlab.org.cn
Research Interests
Large Language Models, Multimodal Large Models, Intelligent Document Parsing and Understanding
Awards
- Led the development of an industry-leading satellite imagery analysis system, setting new technical benchmarks, and deployed it in multiple satellite and surveying centers.
Selected Publications
- Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations, ACL 2024
- VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis, AAAI 2025
- Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning, AAAI 2025
- GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation, NAACL 2025 findings
- OpenHuEval: Evaluating Large Language Model on Hungarian Specifics, arxiv 2025
- PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model, Arxiv 2025
Jiantao Qiu (邱剑涛)
Young Researcher, Shanghai Artificial Intelligence Laboratory.
B.S. and Ph.D. in Electronic Engineering from Tsinghua University.
Email: qiujiantao@pjlab.org.cn
Research Interests
Large Language Model Datasets, HTML Document Understanding, Energy-Efficient Neural Network Accelerator Design, Multi-Machine Collaborative System Design
Awards
- AI2000 Most Influential Scholar Award Honorable Mention in AAAI/IJCAI (2023, for work in FPGA) - Top 3 Rising Star in FPGA
Selected Publications
- Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, FPGA 2016
Wentao Zhang (张文涛)
Assistant Professor, Researcher, and Doctoral Supervisor at the International Machine Learning Research Center, Peking University.
Research Consultant at Shanghai Artificial Intelligence Laboratory.
Formerly at Tencent Machine Learning Platform Department, Apple AIML, and Mila - Quebec AI Institute.
Email: zhangwentao1@pjlab.org.cn
Research Interests
Data-centric machine learning and large model data governance.
Awards
- WWW'22 Best Student Paper Award (1/1822)
- AP-Web'23 Best Paper Runner Up Award
- CIKM'24 Best Student Full Paper Award (1/1496)
- Apple Scholar (2021, sole recipient in Asia-Pacific)
- World Artificial Intelligence Conference (WAIC) Yunfan Award (1 of 15 globally)
- Peking University/Beijing Municipal/Chinese Association for Artificial Intelligence Excellent Doctoral Dissertation Award, 2023
- Peking University "Weiming Young Scholar", 2024
- World Internet Conference Leading Scientific and Technological Achievement Award, 2024
- Huawei Spark Award, 2024
- Chinese Institute of Electronics Science and Technology Progress Award (First Prize), 2023
Key Research, Projects & Reports
- Angel: a high-performance distributed machine learning and graph computing platform, jointly designed by Tencent and PKU.
- SGL: a scalable graph learning toolkit for extremely large graph datasets.
- MindWare: a powerful AutoML system, which automates feature engineering, algorithm selection and hyperparameter tuning.
- OpenBox: an efficient open-source system designed for solving generalized black-box optimization (BBO) problems.
Selected Publications
- PAS: Data-Efficient Plug-and-Play Prompt Augmentation System, ICDE 2025
- DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning, ICDE 2025
- Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning, ICLR 2025
- Towards Precise Scaling Laws for Video Diffusion Transformers, CVPR 2025
- Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, NeurIPS 2024
- Physics-guided Active Sample Reweighting for Urban Flow Prediction, CIKM 2024 Best Student Full Paper
- PaSca: a Graph Neural Architecture Search System under the Scalable Paradigm, WWW 2022 Best Student Paper
- RIM: Reliable Influence-based Active Learning on Graphs, NeurIPS 2021 Spotlight
Weijia Li (李唯嘉)
Associate Professor ("Hundred Talents Program"), Sun Yat-sen University.
Research Consultant, Shanghai Artificial Intelligence Laboratory.
B.S. from Sun Yat-sen University, Ph.D. from Tsinghua University, Postdoc at MMLab, CUHK.
Email: liweijia@pjlab.org.cn
Research Interests
Multimodal Large Models, Image Generation, Synthetic Data Detection, AI4Earth
Key Research, Projects & Reports
- Report: "GPT-4o图像生成架构被"破解"了?自回归主干+扩散解码器,还有4o图像生成全面测评基准"
- Report: "ICLR 2025 Spotlight |合成数据伪装术 vs 大模型火眼金睛,中大&上海AI Lab提出合成检测基准LOKI"
- Report: "城市复杂环境下具身大模型基准测试!UrBench:综合评估多模态大模型在多视图城市场景中的基准"
- Report: "论文赏读 | CVPR24 | 结合卫星和街景图像实现精细的建筑属性分割,入选Highlight!"
Selected Publications
- LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models (ICLR 2025, Spotlight)
- Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenarios (AAAI 2025)
- Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation (Arxiv 2025)
- LEGION: Learning to Ground and Explain for Synthetic Image Detection (Arxiv 2025)
- Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation (Arxiv 2025)
- SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation (CVPR 2024 Highlight)
- 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions (CVPR 2024)
- Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network (ECCV 2024)
- Omnicity: Omnipotent city understanding with multi-level and multi-view images (CVPR 2023)
Joint Training Students
Hengrui Kang (康恒锐)
Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: University of Electronic Science and Technology of China (UESTC)
Grade: 2st-year Ph.D. student (Starting Sep 2024)
Research Interests: Synthetic data detection, intelligent document parsing and generation
Jiahe Song (宋家和)
Joint Ph.D. Program: School of AI, Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: Peking University
Grade: 1st-year Ph.D. student (Starting Sep 2025)
Research Interests: Multimodal large models, AI for science
Honglin Lin (林泓霖)
Joint Ph.D. Program: School of AI, Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: Beijing University of Posts and Telecommunications (BUPT)
Grade: 1st-year Ph.D. student (Starting Sep 2025)
Research Interests: Mathematical reasoning in large models, data synthesis, etc.
Junbo Niu (牛俊博)
Joint Ph.D. Program: Peking University (PKU) & Shanghai AI Laboratory
Undergraduate university: Beihang University
Grade: 1st-year Ph.D. student (Starting Sep 2025)
Research Interests: Multimodal Understanding (Video Understanding, OCR) & Data-Centric Machine Learning
Xin Gao (高鑫)
Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: University of Electronic Science and Technology of China (UESTC)
Grade: 1st-year Ph.D. student (Starting Sep 2025)
Research Interests: Data synthesis, evaluation, and filtering for large models, etc.
Yu Li (李宇)
Joint Ph.D. Program: University of Science and Technology of China (USTC) & Shanghai AI Laboratory
Undergraduate university: Wuhan University
Grade: 1st-year Ph.D. student (Starting Sep 2025)
Research Interests: Logical reasoning in large models, data synthesis, etc.
Zichen Wen (温子辰)
Joint Ph.D. Program: Shanghai AI Laboratory & Shanghai Jiao Tong University (SJTU)
Undergraduate university: University of Electronic Science and Technology of China (UESTC)
Grade: 1st-year Ph.D. student (Starting Sep 2025)
Research Interests: Efficient AI (including Lightweight and Efficient Large Models for Language/Multimodality, and Data-Efficient Artificial Intelligence)
Zhanping Zhong (钟展平)
Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: Beihang University
Grade: 1st-year Ph.D. student (Starting Sep 2025)
Research Interests: LLM agent, data synthesis, data selection, etc.
Xiaoran Shang (尚萧然)
Joint Ph.D. Program: University of Science and Technology of China (USTC) & Shanghai AI Laboratory
Undergraduate university: Wuhan University
Grade: 0th-year Ph.D. student (Starting Sep 2026)
Research Interests: Multimodal Large Models, data synthesis, data selection, etc.
Shoupeng Wang (王首鹏)
Joint Ph.D. Program: University of Science and Technology of China (USTC) & Shanghai AI Laboratory
Undergraduate university: Wuhan University
Grade: 0th-year Ph.D. student (Starting Sep 2026)
Hejun Dong (董和军)
Joint Ph.D. Program: The Chinese University of Hong Kong & Shanghai AI Laboratory
Undergraduate university: Beihang University
Grade: 0th-year Ph.D. student (Starting Sep 2026)
Jie Yang (杨杰)
Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: Wuhan University
Grade: 0th-year Ph.D. student (Starting Sep 2026)
Chuang Wang (王闯)
Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: Beihang University
Grade: 0th-year Ph.D. student (Starting Sep 2026)
Research Interests: Multimodal large models, AI for science
Profile: https://chuangwang123.github.io/
Jutao Xiao (肖举涛)
Joint Ph.D. Program: Shanghai Jiao Tong University (SJTU) & Shanghai AI Laboratory
Undergraduate university: Northeastern University
Grade: 0th-year Ph.D. student (Starting Sep 2026)