OpenDataLab provides a comprehensive suite of resources for researchers, practitioners, and students interested in data-centric AI. Below you'll find our datasets, tools and platforms that support cutting-edge AI research and development.
Datasets
OmniDocBench
A high-quality multi-source evaluation benchmark that pioneers a new paradigm for document parsing assessment
Access DatasetWanjuan Silkroad
The first large-scale multilingual corpus covering mainstream modalities
Access DatasetShusheng Wanjuan
Shanghai AI Lab's first open-source high-quality multimodal pretraining corpus for large models
Access DatasetPlatforms & Tools
OpenDataLab
China's most influential large model data platform in terms of volume and data scale
Visit PlatformLabel U
A flexible annotation tool compatible with multiple data formats and freely configurable combinations
Explore DetailsMore Resources
For additional resources and tools, please visit OpenDataLab's Open Source Tools platform. The platform provides a comprehensive collection of AI development resources.