• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EPIC-KITCHENS 数据集:采集、挑战与基准。

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4125-4141. doi: 10.1109/TPAMI.2020.2991965. Epub 2021 Oct 1.

DOI:10.1109/TPAMI.2020.2991965
PMID:32365017
Abstract

Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions. Our videos depict nonscripted daily activities, as recording is started every time a participant entered their kitchen. Recording took place in four countries by participants belonging to ten different nationalities, resulting in highly diverse kitchen habits and cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labelled for a total of 39.6K action segments and 454.2K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. We introduce new baselines that highlight the multimodal nature of the dataset and the importance of explicit temporal modelling to discriminate fine-grained actions (e.g., 'closing a tap' from 'opening' it up).

摘要

自 2018 年推出以来,EPIC-KITCHENS 作为最大的自我中心视频基准引起了关注,提供了人们与物体互动、注意力甚至意图的独特视角。在本文中,我们详细介绍了如何通过 32 名参与者在其原生厨房环境中捕捉到这个大规模数据集,并对其进行了密集的动作和物体交互标注。我们的视频描绘了非脚本化的日常活动,每次参与者进入厨房时都会开始录制。参与者来自十个不同国家,分布在四个国家,因此厨房习惯和烹饪风格非常多样化。我们的数据集包含 55 小时的视频,共 1150 万帧,我们对其进行了密集标注,共标注了 39600 个动作片段和 454200 个物体边界框。我们的标注是独一无二的,因为我们让参与者自己讲述视频(录制后),从而反映了真实的意图,我们根据这些标注众包了真实标签。我们描述了我们的物体、动作和预期挑战,并在两个测试集(可见和不可见厨房)上评估了几个基线。我们引入了新的基线,突出了数据集的多模态性质,以及显式时间建模对区分细粒度动作(例如,从“打开”到“关闭”水龙头)的重要性。

相似文献

1
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines.EPIC-KITCHENS 数据集:采集、挑战与基准。
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4125-4141. doi: 10.1109/TPAMI.2020.2991965. Epub 2021 Oct 1.
2
Forecasting Action Through Contact Representations From First Person Video.通过第一人称视频的接触表示进行动作预测。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6703-6714. doi: 10.1109/TPAMI.2021.3055233. Epub 2023 May 5.
3
Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video.从第一人称视频中进行动作预测的滚动-展开 LSTM。
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4021-4036. doi: 10.1109/TPAMI.2020.2992889. Epub 2021 Oct 1.
4
In the Eye of the Beholder: Gaze and Actions in First Person Video.旁观者眼中:第一人称视频中的注视与动作
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6731-6747. doi: 10.1109/TPAMI.2021.3051319. Epub 2023 May 8.
5
Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges.航空图像中的目标检测:大规模基准测试与挑战
IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7778-7796. doi: 10.1109/TPAMI.2021.3117983. Epub 2022 Oct 4.
6
Learning to Recognize Actions on Objects in Egocentric Video With Attention Dictionaries.基于注意字典的自主体视频中物体动作识别学习。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6674-6687. doi: 10.1109/TPAMI.2021.3058649. Epub 2023 May 5.
7
Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning.基于图的多粒度交互学习的细粒度视频字幕生成。
IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):666-683. doi: 10.1109/TPAMI.2019.2946823. Epub 2022 Jan 7.
8
Ego4D: Around the World in 3,000 Hours of Egocentric Video.Ego4D:3000小时自我中心视频中的环球之旅。
IEEE Trans Pattern Anal Mach Intell. 2024 Jul 26;PP. doi: 10.1109/TPAMI.2024.3381075.
9
Moments in Time Dataset: One Million Videos for Event Understanding.时间点数据集:用于事件理解的一百万段视频。
IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):502-508. doi: 10.1109/TPAMI.2019.2901464. Epub 2019 Feb 25.
10
Learning to Anticipate Egocentric Actions by Imagination.通过想象学习预测自我中心动作。
IEEE Trans Image Process. 2021;30:1143-1152. doi: 10.1109/TIP.2020.3040521. Epub 2020 Dec 17.

引用本文的文献

1
FoodSky: A food-oriented large language model that can pass the chef and dietetic examinations.FoodSky:一个能够通过厨师和营养师考试的面向食物的大语言模型。
Patterns (N Y). 2025 Apr 22;6(5):101234. doi: 10.1016/j.patter.2025.101234. eCollection 2025 May 9.
2
A Review of Embodied Grasping.具身抓握综述
Sensors (Basel). 2025 Jan 30;25(3):852. doi: 10.3390/s25030852.
3
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations.神经特征融合场:自监督二维图像表示的三维蒸馏
Proc Int Conf 3D Vis. 2023 Feb 22;2022. doi: 10.1109/3DV57658.2022.00056. eCollection 2022 Sep 16.
4
Semantic-aware Video Representation for Few-shot Action Recognition.用于少样本动作识别的语义感知视频表示
IEEE Winter Conf Appl Comput Vis. 2024 Jan;2024:6444-6454. doi: 10.1109/wacv57701.2024.00633. Epub 2024 Apr 9.
5
The real-time hand and object recognition for virtual interaction.用于虚拟交互的实时手部与物体识别
PeerJ Comput Sci. 2024 Jun 27;10:e2110. doi: 10.7717/peerj-cs.2110. eCollection 2024.
6
IoT and Deep Learning-Based Farmer Safety System.基于物联网和深度学习的农民安全系统。
Sensors (Basel). 2023 Mar 8;23(6):2951. doi: 10.3390/s23062951.
7
Cooktop Sensing Based on a YOLO Object Detection Algorithm.基于 YOLO 对象检测算法的炉灶感应。
Sensors (Basel). 2023 Mar 3;23(5):2780. doi: 10.3390/s23052780.
8
Visual Object Tracking in First Person Vision.第一人称视觉中的视觉目标跟踪
Int J Comput Vis. 2023;131(1):259-283. doi: 10.1007/s11263-022-01694-6. Epub 2022 Oct 18.
9
A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint.多模态学习综述——从文本指导的视觉处理视角。
Sensors (Basel). 2022 Sep 8;22(18):6816. doi: 10.3390/s22186816.
10
Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.视频对象与人类动作检测中的视觉特征学习:系统综述
Micromachines (Basel). 2021 Dec 31;13(1):72. doi: 10.3390/mi13010072.