Grauman Kristen, Westbury Andrew, Byrne Eugene, Cartillier Vincent, Chavis Zachary, Furnari Antonino, Girdhar Rohit, Hamburger Jackson, Jiang Hao, Kukreja Devansh, Liu Miao, Liu Xingyu, Martin Miguel, Nagarajan Tushar, Radosavovic Ilija, Ramakrishnan Santhosh Kumar, Ryan Fiona, Sharma Jayant, Wray Michael, Xu Mengmeng, Xu Eric Zhongcong, Zhao Chen, Bansal Siddhant, Batra Dhruv, Crane Sean, Do Tien, Doulaty Morrie, Erapalli Akshay, Feichtenhofer Christoph, Fragomeni Adriano, Fu Qichen, Gebreselasie Abrham, Gonzalez Cristina, Hillis James, Huang Xuhua, Huang Yifei, Jia Wenqi, Khoo Weslie, Kolar Jachym, Kottur Satwik, Kumar Anurag, Landini Federico, Li Chao, Li Yanghao, Li Zhenqiang, Mangalam Karttikeya, Modhugu Raghava, Munro Jonathan, Murrell Tullie, Nishiyasu Takumi, Price Will, Puentes Paola Ruiz, Ramazanova Merey, Sari Leda, Somasundaram Kiran, Southerland Audrey, Sugano Yusuke, Tao Ruijie, Vo Minh, Wang Yuchen, Wu Xindi, Yagi Takuma, Zhao Ziwei, Zhu Yunyi, Arbelaez Pablo, Crandall David, Damen Dima, Farinella Giovanni Maria, Fuegen Christian, Ghanem Bernard, Ithapu Vamsi Krishna, Jawahar C V, Joo Hanbyul, Kitani Kris, Li Haizhou, Newcombe Richard, Oliva Aude, Park Hyun Soo, Rehg James M, Sato Yoichi, Shi Jianbo, Shou Mike Zheng, Torralba Antonio, Torresani Lorenzo, Yan Mingfei, Malik Jitendra
IEEE Trans Pattern Anal Mach Intell. 2024 Jul 26;PP. doi: 10.1109/TPAMI.2024.3381075.
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/.
我们推出了Ego4D,这是一个大规模的以自我为中心的视频数据集和基准测试套件。它提供了3670小时的日常生活活动视频,涵盖数百种场景(家庭、户外、工作场所、休闲等),由来自全球74个地点和9个不同国家的931名独特的摄像佩戴者拍摄。采集方法旨在坚持严格的隐私和道德标准,有同意参与的参与者以及在相关情况下强大的去识别程序。Ego4D极大地扩展了研究社区可公开获取的各种以自我为中心的视频素材的数量。部分视频配有音频、环境的3D网格、目光注视、立体视频,和/或来自同一事件中多个以自我为中心的摄像头的同步视频。此外,我们提出了一系列新的基准测试挑战,围绕理解过去(查询情景记忆)、现在(分析手部与物体的操作、视听对话和社交互动)和未来(预测活动)的第一人称视觉体验。通过公开分享这个大规模的带注释数据集和基准测试套件,我们旨在推动第一人称感知的前沿。项目页面:https://ego4d-data.org/ 。