We aim to build a useful, reproducible, democratized benchmark for learning household robotic manipulation from human videos. To realize this goal, a diverse, high-quality human video dataset curated specifically for robots is desired. To evaluate the learning progress, a simulated twin environment that resembles the appearance and the dynamics of the physical world would help roboticists and AI researchers validate their algorithms convincingly and efficiently before testing on a real robot. We introduce RoboTube, a benchmark platform that can lower the barrier to robotics research while facilitating reproducible research in the community.


We build a diverse and high-quality human video demonstration dataset with multiple functionalities.

Construction Overview


To benchmark the baseline methods, we construct a suite of simulated twin environments, RT-sim. With RT-sim, researchers can make a fair comparison of their approaches with the baseline methods and can validate their algorithms convincingly and efficiently before conducting more complex experiments on real robots.

Latest Paper Version: OpenReview,

