| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Latest update: 🤖 = latest additions | October 16, 2025 | ||||||||||||||||
2 | Note: drop-down filters only work when open in Google Spreadsheets | Number of entries: 11 | ||||||||||||||||
3 | Physical Objects and Artifacts | |||||||||||||||||
4 | Manipulation Datasets | Image | Description | Data | Data Types | Camera Views | Robot Hardware | Relevant Applications | Relevant Tasks | Relevant Physical Objects and Artifacts (see repository linked above) | Samples | Tasks | Notes | Link(s) | License | Citation | Year (Initial Release) | |
5 | 🤖 AgiBot World | AgiBot World is a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loop verification, AgiBot World guarantees high-quality and diverse data distribution. It is extensible from grippers to dexterous hands and visuo-tactile sensors for fine-grained skill acquisition. AgiBot World Beta is the complete dataset featuring over 1M trajectories and Alpha is a subset containing over 92K trajectories. | Real | RGB images, Depth images, Robot pose, Robot velocity, Robot force, Robot torque, Video | External, Wrist | Single arm, Bi-manual, Mobile manipulator, Two-finger, Multi-finger, AgiBot G1 | Commercial/Retail, Logistics/Warehousing, Manufacturing, Service/Domestic | Pick-and-Place, Cloth Folding, Deformable Object Manipulation, Shelf Picking, General Home/Service Tasks | ​ | 1,000,041 | 217 | 100 robots, 100+ real-world scenarios across 5 target domains, 87 types of atomic skills | https://huggingface.co/datasets/agibot-world/AgiBotWorld-Beta https://github.com/OpenDriveLab/Agibot-World | CC BY-NC-SA 4.0 | Bu, Qingwen, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao et al. "Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems." arXiv preprint arXiv:2503.06669 (2025). | 2025 | ||
6 | 🤖 BridgeData V2 | BridgeData V2 is a large and diverse dataset of robotic manipulation behaviors designed to facilitate research in scalable robot learning. The dataset is compatible with open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. Skills learned from the data generalize to novel objects and environments, as well as across institutions. | Real | RGB images, RGB-D images | External, Wrist | Single arm, Two-finger, WidowX 250 | Assistive Robotics, Service/Domestic | Pick-and-Place, Deformable Object Manipulation | ​ | 60,096 | 8 | https://rail-berkeley.github.io/bridgedata/ https://github.com/rail-berkeley/bridge_data_v2 | MIT | Walke, Homer Rich, Kevin Black, Tony Z. Zhao, Quan Vuong, Chongyi Zheng, Philippe Hansen-Estruch, Andre Wang He et al. "Bridgedata v2: A dataset for robot learning at scale." In Conference on Robot Learning, pp. 1723-1736. PMLR, 2023. | 2023 | |||
7 | 🤖 Kaiwu | The dataset first provides an integration of human, environment and robot data collection framework with 20 subjects and 30 interaction objects resulting in totally 11,664 instances of integrated actions. For each of the demonstration, hand motions, operation pressures, sounds of the assembling process, multi-view videos, high-precision motion capture information, eye gaze with firstperson videos, electromyography signals are all recorded. Finegrained multi-level annotation based on absolute timestamp, and semantic segmentation labelling are performed. | Real | Video, 3D skeleton, Audio, Haptic, Eye gaze, IMU, EMG | External | Human hand | Manufacturing | Assembly | ​ | 11,664 | 30 | 20 human subjects | https://www.scidb.cn/en/detail?dataSetId=33060cd729604d2ca7d41189a9fc492b | ​ | Jiang, Shuo, Haonan Li, Ruochen Ren, Yanmin Zhou, Zhipeng Wang, and Bin He. "Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction." IEEE Robotics and Automation Letters, vol. 10, no. 11, pp. 11482-11489, Nov. 2025, doi: 10.1109/LRA.2025.3609615 | 2025 | ||
8 | 🤖 LIBERO | LIBERO is designed for studying knowledge transfer in multitask and lifelong robot learning problems. Successfully resolving these problems require both declarative knowledge about objects/spatial relationships and procedural knowledge about motion/behaviors. LIBERO provides 130 tasks grouped into 4 task suites: LIBERO-Spatial, LIBERO-Object, LIBERO-Goal, and LIBERO-100 | Sim | RGB images | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Assistive Robotics, Commercial/Retail, Service/Domestic | Pick-and-Place, Cloth Folding, Deformable Object Manipulation, Shelf Picking, General Home/Service Tasks | ​ | LIBERO-Spatial: 62,250 frames LIBERO-Object: 74,507 frames LIBERO-Goal: 63,728 frames LIBERO-100: 807,133 frames | LIBERO-Spatial: 10 tasks LIBERO-Object: 10 tasks LIBERO-Goal: 10 tasks LIBERO-100: 100 tasks | https://libero-project.github.io/datasets https://github.com/Lifelong-Robot-Learning/LIBERO | MIT | Liu, Bo, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. "Libero: Benchmarking knowledge transfer for lifelong robot learning." Advances in Neural Information Processing Systems 36 (2023): 44776-44791. | 2023 | |||
9 | 🤖 PartInstruct | PartInstruct is the first benchmark for training and evaluating such models. It features 513 object instances across 14 categories, 1302 manipulation tasks in 16 classes, and over 10,000 expert demonstrations synthesized in a 3D simulator. Each demonstration includes a high-level task instruction, a sequence of basic part-based skills, and ground-truth 3D object data. Additionally, we designed a comprehensive test suite to evaluate the generalizability of learned policies across new states, objects, and tasks. | Sim | RGB images, Depth images, Point clouds, Segmentation masks, 3D object model meshes | External | Single arm, Two-finger, Franka Emika Panda | Assistive Robotics, Commercial/Retail, Service/Domestic | Pick-and-Place, Shelf Picking, General Home/Service Tasks, Grasping | ​ | 10,000 | 1,302 | 513 object instances across 14 categories 16 task classes | https://huggingface.co/datasets/SCAI-JHU/PartInstruct https://github.com/SCAI-JHU/PartInstruct https://partinstruct.github.io/ | MIT | Yin, Yifan, Zhengtao Han, Shivam Aarya, Jianxin Wang, Shuhang Xu, Jiawei Peng, Angtian Wang, Alan Yuille, and Tianmin Shu. "PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation." arXiv preprint arXiv:2505.21652 (2025). | 2025 | ||
10 | DROID (Distributed Robot Interaction Dataset) | DROID (Distributed Robot Interaction Dataset) is a diverse robot manipulation dataset with 76k demonstration trajectories or 350h of interaction data, collected across 564 scenes and 86 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance, greater robustness, and improved generalization ability. We open source the full dataset, code for policy training, and a detailed guide for reproducing our robot hardware setup. | Real | RGB images, Robot pose, Robot velocity | External, Wrist | Single arm, Two-finger, Franka Emika Panda, Robotiq 2F-85 | Assistive Robotics, Commercial/Retail, Service/Domestic | General Home/Service Tasks | ​ | 76,000 | 86 | https://colab.research.google.com/drive/1b4PPH4XGht4Jve2xPKMCh-AXXAQziNQa https://droid-dataset.github.io/ | CC BY 4.0 | Khazatsky, Alexander, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany et al. "Droid: A large-scale in-the-wild robot manipulation dataset." arXiv preprint arXiv:2403.12945 (2024). | 2024 | |||
11 | Functional Manipulation Benchmark (FMB) | Our dataset consists of objects in diverse appearance and geometry. It requires multi-stage and multi-modal fine motor skills to successfully assemble the pegs onto a unfixed board in a randomized scene. We collected a total of 22,550 trajectories across two different tasks on a Franka Panda arm. We record the trajectories from 2 global views and 2 wrist views. Each view contains both RGB and depth map. Two datasets included: Single-Object Multi-Stage Manipulation Task Full Dataset and Multi-Object Multi-Stage Manipulation Task with Assembly 1, 2, and 3. | Real | RGB images, Depth images, Robot pose, Robot velocity, Robot force, Robot torque | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Manufacturing | Assembly | Functional Manipulation Benchmark (FMB) | 22,550 | 2 | https://functional-manipulation-benchmark.github.io/dataset/index.html | CC BY 4.0 | Luo, Jianlan, Charles Xu, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, and Sergey Levine. "Fmb: a functional manipulation benchmark for generalizable robotic learning." The International Journal of Robotics Research (2023): 02783649241276017. | 2023 | |||
12 | FurnitureBench | FurnitureBench is the real-world furniture assembly benchmark, which aims at providing a reproducible and easy-to-use platform for long-horizon complex robotic manipulation. Furniture assembly poses integral robotic manipulation challenges that autonomous robots must be capable of: long-horizon planning, dexterous control, and robust visual perception. By presenting a well-defined suite of tasks with a lower barrier of entry (large-scale human teleoperation data and standardized configurations), we encourage the research community to push the boundaries of the current robotic system. | Real | RGB-D images, Robot pose, Robot velocity, AprilTag poses, Metadata | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Commercial/Retail, Manufacturing, Service/Domestic | Assembly | FurnitureBench | 5,100 | 9 | https://clvrai.github.io/furniture-bench/docs/tutorials/dataset.html https://clvrai.github.io/furniture-bench/ | MIT | Heo, Minho, Youngwoon Lee, Doohyun Lee, and Joseph J. Lim. "Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation." The International Journal of Robotics Research (2023): 02783649241304789. | 2023 | |||
13 | Open X-Embodiment | Open X-Embodiment provides datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. | Real, Sim | RGB images, Depth images, Robot pose, Robot velocity | External, Wrist | Single arm, Bi-manual, Mobile manipulator, Two-finger, Suction, Robotiq 2F-85, WSG-50 | Assistive Robotics, Commercial/Retail, Service/Domestic | General Home/Service Tasks | ​ | 1,000,000 | 160,266 | 22 robot embodiments across 21 institutions | https://robotics-transformer-x.github.io/ https://github.com/google-deepmind/open_x_embodiment https://docs.google.com/spreadsheets/d/1rPBD77tk60AEIGZrGSODwyyzs5FgCU9Uz3h-3_t2A9g/ | Apache 2.0 | O’Neill, Abby, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley et al. "Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0." In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 6892-6903. IEEE, 2024. | 2024 | ||
14 | REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt) | REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt) is a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. | Real | RGB images, Robot pose, Robot velocity, Robot force, Robot torque, Audio, Event camera | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Manufacturing | Assembly | NIST Assembly Task Boards (ATB) | 4,551 | 2 | Tasks: Assemble, Disassemble | https://tuwien-asl.github.io/REASSEMBLE_page/ https://researchdata.tuwien.ac.at/records/0ewrv-8cb44 | MIT | Sliwowski, Daniel, Shail Jadav, Sergej Stanovcic, Jedrzej Orbik, Johannes Heidersberger, and Dongheui Lee. "Reassemble: A multimodal dataset for contact-rich robotic assembly and disassembly." In Proceedings of Robotics: Science and Systems (RSS) 2025. | 2025 | ||
15 | RH20T | RH20T is a dataset comprising over 110,000 contact-rich robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected in the real world. Each sequence in the dataset includes visual, force, audio, and action information, along with a corresponding human demonstration video. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. | Real | RGB images, Depth images, Robot pose, Robot force, Robot torque, IR images, Audio, Tactile | ​ | Single arm, Two-finger, Franka Emika Panda, UR5, Flexiv, DH Robotics AG-95, Robotiq 2F-85, WSG-50 | Assistive Robotics, Commercial/Retail, Service/Domestic | General Home/Service Tasks | ​ | 110,000 | 147 | Tasks: 48 from RLBench, 29 from MetaWorld, 70 self-proposed | https://rh20t.github.io/ https://github.com/rh20t/rh20t_api | CC BY-NC 4.0, CC-BY-SA 4.0, MIT | Fang, Hao-Shu, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. "Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot." In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 653-660. IEEE, 2024. | 2024 |