| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Latest update: 🤖 = latest additions | March 20, 2026 | ||||||||||||||||
2 | Note: drop-down filters only work when open in Google Spreadsheets | Number of entries: 21 | ||||||||||||||||
3 | Physical Objects and Artifacts | |||||||||||||||||
4 | Manipulation Datasets | Image | Description | Data | Data Types | Camera Views | Robot Hardware | Relevant Applications | Relevant Tasks | Relevant Physical Objects and Artifacts (see repository linked above) | Samples | Tasks | Notes | Link(s) | License | Citation | Year (Initial Release) | |
5 | 🤖 Dex1B: Articulation | Dex1B is a large-scale, diverse, and high-quality demonstration dataset generated using generative models. It contains one billion demonstrations for two fundamental dexterous manipulation tasks: grasping and articulation. The dataset was constructed using DexSimple, a generative model that integrates geometric constraints to improve feasibility and incorporates additional conditions to enhance diversity. For grasping, 1 million scenes were constructed using object assets from Objaverse. For articulation, scenes were constructed using object assets from PartNet-Mobility. | Sim | Point clouds, Robot pose, Action sequences, Depth maps | Single-view | UR5, Unitree G1 | Application Agnostic | Articulated Object Manipulation, Grasping | | 1,000,000,000 | 2 fundamental tasks (Grasping and Articulation) across 6,000+ objects | https://jianglongye.com/dex1b/ | | Ye, Jianglong, Keyi Wang, Chengjing Yuan, Ruihan Yang, Yiquan Li, Jiyue Zhu, Yuzhe Qin, Xueyan Zou, and Xiaolong Wang. "Dex1b: Learning with 1b demonstrations for dexterous manipulation." arXiv preprint arXiv:2506.17198 (2025). | 2025 | |||
6 | 🤖 Flat'n'Fold | Comprising 1,212 human and 887 robot demonstrations of flattening and folding 44 unique garments across 8 categories, Flat'n'Fold surpasses prior datasets in size, scope, and diversity. The dataset uniquely captures the entire manipulation process from crumpled to folded states, providing synchronized multi-view RGB-D images, point clouds, and action data, including hand or gripper positions and rotations. It includes both human demonstrations (20 participants) and human-controlled robot demonstrations. | Real | RGB images, Depth images, Point clouds, Action sequences, Robot pose, Robot joint states, Tracker data | Multi-view | Rethink Robotics Baxter | Service/Domestic | Deformable Object Manipulation | | 2,099 | 2 main tasks (Flattening and Folding) across 44 unique garments in 8 categories | Includes 6,329 human and 5,574 robot annotated point clouds for grasping point prediction benchmark. ~20,000 annotated sub-task boundaries for task decomposition | https://cvas-ug.github.io/flat-n-fold | CC BY 4.0 | Zhuang, Lipeng, Shiyu Fan, Yingdong Ru, Florent P. Audonnet, Paul Henderson, and Gerardo Aragon-Camarasa. "Flat'n'Fold: A Diverse Multi-Modal Dataset for Garment Perception and Manipulation." In 2025 IEEE International Conference on Robotics and Automation (ICRA), pp. 7937-7944. IEEE, 2025. | 2025 | ||
7 | 🤖 Galaxea Open-World Dataset | Galaxea Open-World Dataset is a large-scale, diverse collection of robot behaviors recorded in authentic human living and working environments, comprising 500+ hours of real-world mobile manipulation data. It contains 100,000 demonstration trajectories spanning 150+ task categories across 50 distinct real-world scenes at 11 physical sites, covering residential, catering, retail, and office spaces. All demonstrations are gathered using a single consistent robotic embodiment (Galaxea R1 Lite) and enriched with precise subtask-level language annotationso facilitate both training and evaluation. | Real | RGB images, Depth images, Robot joint states, Action sequences | External | Galaxea R1 Lite, Dual Galaxea A1X | Service/Domestic, Commercial/Retail | Pick-and-Place, General Home/Service Tasks | | 100,000 | 150+ task categories, 58 operational skills | Data collected at 11 physical sites yielding 50 unique scenes; 1,600+ unique real-world objects sourced from retail suppliers across residential, kitchen, retail, and office environments | https://opengalaxea.github.io/G0/ https://huggingface.co/datasets/OpenGalaxea/Galaxea-Open-World-Dataset | CC BY-NC-SA 4.0 | Jiang, Tao, Tianyuan Yuan, Yicheng Liu, Chenhao Lu, Jianning Cui, Xiao Liu, Shuiqi Cheng, Jiyang Gao, Huazhe Xu, and Hang Zhao. "Galaxea open-world dataset and g0 dual-system vla model." arXiv preprint arXiv:2509.00576 (2025). | 2025 | ||
8 | 🤖 Hand–Object to Robot Action Dataset (HORA) | HORA is a large-scale multimodal dataset for cross-embodiment robotic learning from human(hand) object interactions. Built on the RoboWheel pipeline, it unifies heterogeneous HOI sources through physically plausible reconstruction, canonical action-space alignment, cross-embodiment retargeting, and simulation-based augmentation. It provides both HOI and robot-oriented modalities, including MANO hand parameters, 6-DoF object poses, contact annotations, robot observations, end-effector trajectories, and dense tactile signals for the mocap subset. | Real | RGB images, Depth images, 6D poses, Robot joint states | External | UR5, Franka Emika Panda, KUKA LBR iiwa 7, Kinova Gen3 | Application Agnostic | Pick-and-Place, Human-Robot Handovers, General Home/Service Tasks | | 150,000 | Covers diverse manipulation tasks derived from multiple public HOI datasets plus custom recordings | https://zhangyuhong01.github.io/Robowheel/ https://huggingface.co/datasets/HORA-DB/HORA https://github.com/zhangyuhong01/Robowheel-Toolkits | | Zhang, Yuhong, Zihan Gao, Shengpeng Li, Ling-Hao Chen, Kaisheng Liu, Runqing Cheng, Xiao Lin, Junjia Liu, Zhuoheng Li, Jingyi Feng, Ziyan He, Jintian Lin, Zheyan Huang, Zhifang Liu, and Haoqian Wang. "RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning." arXiv preprint arXiv:2512.02729 (2025). | 2025 | |||
9 | 🤖 Purpose-driven Robotic Interaction in Scene Manipulation (PRISM) | PRISM is a large-scale synthetic dataset created to overcome the limitations of prior small-scale, simplistic task-oriented grasping datasets. It is constructed by composing 10,000 procedurally-generated cluttered scenes using 2,356 object instances from ShapeNet-Sem paired with stable grasps from the ACRONYM dataset, resulting in 378,844 task-grasp samples. Each scene is rendered from 10 camera views, and every sample contains a rendered RGB-D scene, a natural language task description, a calibrated 6-DoF grasp pose, a spatial grasp description, and a pixel-level grasp location. | Sim | RGB images, Depth images, 6D poses | Multi-view | Franka Emika Panda | Service/Domestic | Pick-and-Place, Tool Use | | 378,844 | 568 unique task categories | 2,356 diverse object instances from ShapeNet-Sem covering household, kitchen, office, and tool categories | https://abhaybd.github.io/GraspMolmo/ https://huggingface.co/datasets/allenai/PRISM | MIT | Deshpande, Abhay, Yuquan Deng, Arijit Ray, Jordi Salvador, Winson Han, Jiafei Duan, Kuo-Hao Zeng, Yuke Zhu, Ranjay Krishna, and Rose Hendrix. "Graspmolmo: Generalizable task-oriented grasping via large-scale synthetic data generation." arXiv preprint arXiv:2505.13441 (2025). | 2025 | ||
10 | 🤖 RefSpatial | RefSpatial is a large-scale, multi-source dataset of 2.5 million high-quality examples totaling 20 million QA pairs, designed to train and fine-tune Vision-Language Models (VLMs) on spatial referring tasks for robotics. It integrates data from three complementary source types: 2D web images (OpenImages) for broad spatial concepts and depth perception, 3D embodied videos (CA-1M) for fine-grained indoor scene understanding, and procedurally generated simulated data with ground-truth reasoning chains to support multi-step spatial referring (up to 5 steps). | Real, Sim | RGB images, Depth images, 3D skeleton | Single-view | UR5, Unitree G1 | Application Agnostic | Pick-and-Place | | 2,500,000 | 31 spatial relations (left/right, above/below, front/back, near/far, metric distance, orientation, etc.); single-step and multi-step spatial reasoning (up to 5 steps) | https://huggingface.co/datasets/JingkunAn/RefSpatial https://zhoues.github.io/RoboRefer/ | | Zhou, Enshen, Jingkun An, Cheng Chi, Yi Han, Shanyu Rong, Chi Zhang, Pengwei Wang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, and Shanghang Zhang. "Roborefer: Towards spatial referring with reasoning in vision-language models for robotics." arXiv preprint arXiv:2506.04308 (2025). | 2025 | |||
11 | 🤖 Robo360 | Robo360 is the first real-world omnispective multi-view and multi-material robotic manipulation dataset, designed to bridge 3D scene understanding with robot control. It features robotic manipulation captured with dense 360° surrounding view coverage, enabling high-quality 3D neural representation learning (particularly dynamic NeRF) and multi-view policy learning. The dataset contains over 2,000 demonstration trajectories of more than 100 distinct objects with diverse material variations including rigid, deformable, transparent, and reflective objects. | Real | Video, Audio, Robot joint states | Multi-view | Single arm | Application Agnostic | Articulated Object Manipulation, Pick-and-Place | | 2,000 | Diverse object manipulation tasks across 100+ objects with varying material properties | https://huggingface.co/datasets/liuyubian/Robo360 | | Liang, Litian, Liuyu Bian, Caiwei Xiao, Jialin Zhang, Linghao Chen, Isabella Liu, Fanbo Xiang, Zhiao Huang, and Hao Su. "Robo360: a 3D omnispective multi-material robotic manipulation dataset." arXiv preprint arXiv:2312.06686 (2023). | 2023 | |||
12 | 🤖 RoboCerebra | RoboCerebra is a large-scale benchmark for evaluating high-level System 2 reasoning in long-horizon robotic manipulation, targeting capabilities such as planning, reflection, and episodic memory that are underexplored by existing reactive (System 1) benchmarks. It features 1,000 human-annotated simulation trajectories across 100 task variants, each spanning up to 3,000 simulation steps, constructed via a top-down pipeline where GPT generates task instructions and decomposes them into subtask sequences which human operators then execute in simulation. | Sim | RGB images, Video | Single-view, External | Single arm | Service/Domestic | General Home/Service Tasks | | 1,000 | 100 task variants across 6 subtask-type categories; 4-15 subtask steps per trajectory | Common household objects including cream cheese, popcorn, butter, cookies, wine bottles, tomato sauce, BBQ sauce, chocolate pudding, alphabet soup, milk, frying pans | https://github.com/qiuboxiang/RoboCerebra https://huggingface.co/datasets/qiukingballball/RoboCerebra https://robocerebra.github.io/ | MIT | Han, Songhao, Boxiang Qiu, Yue Liao, Siyuan Huang, Chen Gao, Shuicheng Yan, and Si Liu. "Robocerebra: A large-scale benchmark for long-horizon robotic manipulation evaluation." arXiv preprint arXiv:2506.06677 (2025). | 2025 | ||
13 | 🤖 RoboMIND | RoboMIND is the largest multi-embodiment teleoperation dataset collected on a unified standardized platform, comprising 107K real-world demonstration trajectories spanning 479 distinct tasks across 96 unique object classes and amounting to 305.5 hours of interaction data. It covers four distinct robot embodiments, Franka Emika Panda, UR5e, AgileX Cobot Magic V2.0, and the Tien Kung humanoid and uniquely includes 5,000 real-world failure demonstrations each annotated with cause of failure to enable failure reflection and correction during policy learning. | Real | RGB images, Depth images, Robot joint states, Action sequences | External, Multi-view | Franka Emika Panda, UR5, Tien Kung Humanoid, AgileX Cobot Magic V2.0 | Service/Domestic | Articulated Object Manipulation | | 107,000 | 479 distinct tasks (v1.2); 279 tasks in initial v1.0 release | 96 object classes across domestic, kitchen, industrial, office, and retail categories | https://huggingface.co/datasets/x-humanoid-robomind/RoboMIND https://github.com/x-humanoid-robomind/x-humanoid-robomind.github.io https://x-humanoid-robomind.github.io/ | Apache 2.0 | Wu, Kun, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, Shichao Fan, Xinhua Wang, Fei Liao, Zhen Zhao, Guangyu Li, Zhao Jin, Lecheng Wang, Jilei Mao, Ning Liu, Pei Ren, Qiang Zhang, Yaoxu Lyu, Mengzhen Liu, Jingyang He, Yulin Luo, Zeyu Gao, Chenxuan Li, Chenyang Gu, Yankai Fu, Di Wu, Xingyu Wang, Sixiang Chen, Zhenyu Wang, Pengju An, Siyuan Qian, Shanghang Zhang, and Jian Tang. "Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation." arXiv preprint arXiv:2412.13877 (2024). | 2024 | ||
14 | 🤖 RoboVerse | RoboVerse is a comprehensive unified framework comprising a simulation platform (MetaSim), a large-scale synthetic dataset, and standardized benchmarks for both imitation learning and reinforcement learning, designed to overcome the data-scaling and evaluation-standardization bottlenecks in robot learning. The dataset contains ~500K unique high-fidelity trajectories covering 276 task categories with ~5.5K assets and over 10 million transitions. | Sim | RGB images, Depth images, Robot joint states, 6D poses, Action sequences | External, Multi-view | Franka Emika Panda, Unitree G1 | Application Agnostic | Pick-and-Place, Articulated Object Manipulation | | 500,000 | 276 task categories; 1,000+ distinct task variants; Open6DOR subset alone has 5,000+ tasks across position, rotation, and 6-DoF tracks | https://roboverseorg.github.io/ | Apache 2.0 | Geng, Haoran, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, Yutong Liang, Dylan Goetting, Chaoyi Xu, Haozhe Chen, Yuxi Qian, Yiran Geng, Jiageng Mao, Weikang Wan, Mingtong Zhang, Jiangran Lyu, Siheng Zhao, Jiazhao Zhang, Jialiang Zhang, Chengyang Zhao, Haoran Lu, Yufei Ding, Ran Gong, Yuran Wang, Yuxuan Kuang, Ruihai Wu, Baoxiong Jia, Carlo Sferrazza, Hao Dong, Siyuan Huang, Yue Wang, Jitendra Malik, and Pieter Abbeel. "Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning." arXiv preprint arXiv:2504.18904 (2025). | 2025 | |||
15 | AgiBot World | AgiBot World is a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loop verification, AgiBot World guarantees high-quality and diverse data distribution. It is extensible from grippers to dexterous hands and visuo-tactile sensors for fine-grained skill acquisition. AgiBot World Beta is the complete dataset featuring over 1M trajectories and Alpha is a subset containing over 92K trajectories. | Real | RGB images, Depth images, Robot pose, Robot velocity, Robot force, Robot torque, Video | External, Wrist | Single arm, Bi-manual, Mobile manipulator, Two-finger, Multi-finger, AgiBot G1 | Commercial/Retail, Logistics/Warehousing, Manufacturing, Service/Domestic | Pick-and-Place, Cloth Folding, Deformable Object Manipulation, Shelf Picking, General Home/Service Tasks | | 1,000,041 | 217 | 100 robots, 100+ real-world scenarios across 5 target domains, 87 types of atomic skills | https://huggingface.co/datasets/agibot-world/AgiBotWorld-Beta https://github.com/OpenDriveLab/Agibot-World | CC BY-NC-SA 4.0 | Bu, Qingwen, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao et al. "Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems." arXiv preprint arXiv:2503.06669 (2025). | 2025 | ||
16 | BridgeData V2 | BridgeData V2 is a large and diverse dataset of robotic manipulation behaviors designed to facilitate research in scalable robot learning. The dataset is compatible with open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. Skills learned from the data generalize to novel objects and environments, as well as across institutions. | Real | RGB images, RGB-D images | External, Wrist | Single arm, Two-finger, WidowX 250 | Assistive Robotics, Service/Domestic | Pick-and-Place, Deformable Object Manipulation | | 60,096 | 8 | https://rail-berkeley.github.io/bridgedata/ https://github.com/rail-berkeley/bridge_data_v2 | MIT | Walke, Homer Rich, Kevin Black, Tony Z. Zhao, Quan Vuong, Chongyi Zheng, Philippe Hansen-Estruch, Andre Wang He et al. "Bridgedata v2: A dataset for robot learning at scale." In Conference on Robot Learning, pp. 1723-1736. PMLR, 2023. | 2023 | |||
17 | DROID (Distributed Robot Interaction Dataset) | DROID (Distributed Robot Interaction Dataset) is a diverse robot manipulation dataset with 76k demonstration trajectories or 350h of interaction data, collected across 564 scenes and 86 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance, greater robustness, and improved generalization ability. We open source the full dataset, code for policy training, and a detailed guide for reproducing our robot hardware setup. | Real | RGB images, Robot pose, Robot velocity | External, Wrist | Single arm, Two-finger, Franka Emika Panda, Robotiq 2F-85 | Assistive Robotics, Commercial/Retail, Service/Domestic | General Home/Service Tasks | | 76,000 | 86 | https://colab.research.google.com/drive/1b4PPH4XGht4Jve2xPKMCh-AXXAQziNQa https://droid-dataset.github.io/ | CC BY 4.0 | Khazatsky, Alexander, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany et al. "Droid: A large-scale in-the-wild robot manipulation dataset." arXiv preprint arXiv:2403.12945 (2024). | 2024 | |||
18 | Functional Manipulation Benchmark (FMB) | Our dataset consists of objects in diverse appearance and geometry. It requires multi-stage and multi-modal fine motor skills to successfully assemble the pegs onto a unfixed board in a randomized scene. We collected a total of 22,550 trajectories across two different tasks on a Franka Panda arm. We record the trajectories from 2 global views and 2 wrist views. Each view contains both RGB and depth map. Two datasets included: Single-Object Multi-Stage Manipulation Task Full Dataset and Multi-Object Multi-Stage Manipulation Task with Assembly 1, 2, and 3. | Real | RGB images, Depth images, Robot pose, Robot velocity, Robot force, Robot torque | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Manufacturing | Assembly | Functional Manipulation Benchmark (FMB) | 22,550 | 2 | https://functional-manipulation-benchmark.github.io/dataset/index.html | CC BY 4.0 | Luo, Jianlan, Charles Xu, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, and Sergey Levine. "Fmb: a functional manipulation benchmark for generalizable robotic learning." The International Journal of Robotics Research (2023): 02783649241276017. | 2023 | |||
19 | FurnitureBench | FurnitureBench is the real-world furniture assembly benchmark, which aims at providing a reproducible and easy-to-use platform for long-horizon complex robotic manipulation. Furniture assembly poses integral robotic manipulation challenges that autonomous robots must be capable of: long-horizon planning, dexterous control, and robust visual perception. By presenting a well-defined suite of tasks with a lower barrier of entry (large-scale human teleoperation data and standardized configurations), we encourage the research community to push the boundaries of the current robotic system. | Real | RGB-D images, Robot pose, Robot velocity, AprilTag poses, Metadata | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Commercial/Retail, Manufacturing, Service/Domestic | Assembly | FurnitureBench | 5,100 | 9 | https://clvrai.github.io/furniture-bench/docs/tutorials/dataset.html https://clvrai.github.io/furniture-bench/ | MIT | Heo, Minho, Youngwoon Lee, Doohyun Lee, and Joseph J. Lim. "Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation." The International Journal of Robotics Research (2023): 02783649241304789. | 2023 | |||
20 | Kaiwu | The dataset first provides an integration of human, environment and robot data collection framework with 20 subjects and 30 interaction objects resulting in totally 11,664 instances of integrated actions. For each of the demonstration, hand motions, operation pressures, sounds of the assembling process, multi-view videos, high-precision motion capture information, eye gaze with firstperson videos, electromyography signals are all recorded. Finegrained multi-level annotation based on absolute timestamp, and semantic segmentation labelling are performed. | Real | Video, 3D skeleton, Audio, Haptic, Eye gaze, IMU, EMG | External | Human hand | Manufacturing | Assembly | | 11,664 | 30 | 20 human subjects | https://www.scidb.cn/en/detail?dataSetId=33060cd729604d2ca7d41189a9fc492b | | Jiang, Shuo, Haonan Li, Ruochen Ren, Yanmin Zhou, Zhipeng Wang, and Bin He. "Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction." IEEE Robotics and Automation Letters, vol. 10, no. 11, pp. 11482-11489, Nov. 2025, doi: 10.1109/LRA.2025.3609615 | 2025 | ||
21 | LIBERO | LIBERO is designed for studying knowledge transfer in multitask and lifelong robot learning problems. Successfully resolving these problems require both declarative knowledge about objects/spatial relationships and procedural knowledge about motion/behaviors. LIBERO provides 130 tasks grouped into 4 task suites: LIBERO-Spatial, LIBERO-Object, LIBERO-Goal, and LIBERO-100 | Sim | RGB images | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Assistive Robotics, Commercial/Retail, Service/Domestic | Pick-and-Place, Cloth Folding, Deformable Object Manipulation, Shelf Picking, General Home/Service Tasks | | LIBERO-Spatial: 62,250 frames LIBERO-Object: 74,507 frames LIBERO-Goal: 63,728 frames LIBERO-100: 807,133 frames | LIBERO-Spatial: 10 tasks LIBERO-Object: 10 tasks LIBERO-Goal: 10 tasks LIBERO-100: 100 tasks | https://libero-project.github.io/datasets https://github.com/Lifelong-Robot-Learning/LIBERO | MIT | Liu, Bo, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. "Libero: Benchmarking knowledge transfer for lifelong robot learning." Advances in Neural Information Processing Systems 36 (2023): 44776-44791. | 2023 | |||
22 | Open X-Embodiment | Open X-Embodiment provides datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. | Real, Sim | RGB images, Depth images, Robot pose, Robot velocity | External, Wrist | Single arm, Bi-manual, Mobile manipulator, Two-finger, Suction, Robotiq 2F-85, WSG-50 | Assistive Robotics, Commercial/Retail, Service/Domestic | General Home/Service Tasks | | 1,000,000 | 160,266 | 22 robot embodiments across 21 institutions | https://robotics-transformer-x.github.io/ https://github.com/google-deepmind/open_x_embodiment https://docs.google.com/spreadsheets/d/1rPBD77tk60AEIGZrGSODwyyzs5FgCU9Uz3h-3_t2A9g/ | Apache 2.0 | O’Neill, Abby, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley et al. "Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0." In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 6892-6903. IEEE, 2024. | 2024 | ||
23 | PartInstruct | PartInstruct is the first benchmark for training and evaluating such models. It features 513 object instances across 14 categories, 1302 manipulation tasks in 16 classes, and over 10,000 expert demonstrations synthesized in a 3D simulator. Each demonstration includes a high-level task instruction, a sequence of basic part-based skills, and ground-truth 3D object data. Additionally, we designed a comprehensive test suite to evaluate the generalizability of learned policies across new states, objects, and tasks. | Sim | RGB images, Depth images, Point clouds, Segmentation masks, 3D object model meshes | External | Single arm, Two-finger, Franka Emika Panda | Assistive Robotics, Commercial/Retail, Service/Domestic | Pick-and-Place, Shelf Picking, General Home/Service Tasks, Grasping | | 10,000 | 1,302 | 513 object instances across 14 categories 16 task classes | https://huggingface.co/datasets/SCAI-JHU/PartInstruct https://github.com/SCAI-JHU/PartInstruct https://partinstruct.github.io/ | MIT | Yin, Yifan, Zhengtao Han, Shivam Aarya, Jianxin Wang, Shuhang Xu, Jiawei Peng, Angtian Wang, Alan Yuille, and Tianmin Shu. "PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation." arXiv preprint arXiv:2505.21652 (2025). | 2025 | ||
24 | REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt) | REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt) is a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. | Real | RGB images, Robot pose, Robot velocity, Robot force, Robot torque, Audio, Event camera | External, Wrist | Single arm, Two-finger, Franka Emika Panda | Manufacturing | Assembly | NIST Assembly Task Boards (ATB) | 4,551 | 2 | Tasks: Assemble, Disassemble | https://tuwien-asl.github.io/REASSEMBLE_page/ https://researchdata.tuwien.ac.at/records/0ewrv-8cb44 | MIT | Sliwowski, Daniel, Shail Jadav, Sergej Stanovcic, Jedrzej Orbik, Johannes Heidersberger, and Dongheui Lee. "Reassemble: A multimodal dataset for contact-rich robotic assembly and disassembly." In Proceedings of Robotics: Science and Systems (RSS) 2025. | 2025 | ||
25 | RH20T | RH20T is a dataset comprising over 110,000 contact-rich robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected in the real world. Each sequence in the dataset includes visual, force, audio, and action information, along with a corresponding human demonstration video. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. | Real | RGB images, Depth images, Robot pose, Robot force, Robot torque, IR images, Audio, Tactile | | Single arm, Two-finger, Franka Emika Panda, UR5, Flexiv, DH Robotics AG-95, Robotiq 2F-85, WSG-50 | Assistive Robotics, Commercial/Retail, Service/Domestic | General Home/Service Tasks | | 110,000 | 147 | Tasks: 48 from RLBench, 29 from MetaWorld, 70 self-proposed | https://rh20t.github.io/ https://github.com/rh20t/rh20t_api | CC BY-NC 4.0, CC-BY-SA 4.0, MIT | Fang, Hao-Shu, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. "Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot." In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 653-660. IEEE, 2024. | 2024 |