Selected Publications
Data Intelligence
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
NeurIPS 2025
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
NeurIPS 2025
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer
EMNLP 2025
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
EMNLP 2025
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
ACL 2025
OpenHuEval: Evaluating Large Language Model on Hungarian Specifics
ACL 2025
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
ACL 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenge
ACL 2025
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
ICLR 2025
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
ICLR 2025 (Spotlight)
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
ICLR 2025 (Spotlight)
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
CVPR 2025
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
CVPR 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation
Arxiv 2025
LEGION: Learning to Ground and Explain for Synthetic Image Detection
Arxiv 2025
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Arxiv 2024
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Arxiv 2024
Large Language Models and Multimodal LLMs
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
NeurIPS 2025
Multi-step Visual Reasoning with Visual Tokens Scaling and Verification
NeurIPS 2025
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
NeurIPS 2025
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
EMNLP 2025
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
ACL 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
ACL 2025
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion
ACL 2025
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
ICLR 2025
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
ICCV 2025
Where am I? Cross-View Geo-localization with Natural Language Descriptions
ICCV 2025
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
ICCV 2025
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
ICME 2025
Beyond hallucinations: Enhancing lvlms through hallucination-aware direct preference optimization
ICME 2025
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
ACL 2024
Parrot Captions Teach CLIP to Spot Text
ECCV 2024
VIGC: Visual Instruction Generation and Correction
AAAI 2024
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
AAAI 2025
Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenarios
AAAI 2025
GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation
NAACL 2025 (Findings)
AI for Science
3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization
ICLR 2025
Fast and Accurate Blind Flexible Docking
ICLR 2025
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
AAAI 2025
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
CVPR 2024 (Highlight)
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
CVPR 2024
Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
ECCV 2024
Omnicity: Omnipotent city understanding with multi-level and multi-view images
CVPR 2023
Note: * denotes equal contribution, † denotes corresponding author