Suppr超能文献

EgoCom:一个多人多模态的自我中心通信数据集。

EgoCom: A Multi-Person Multi-Modal Egocentric Communications Dataset.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6783-6793. doi: 10.1109/TPAMI.2020.3025105. Epub 2023 May 5.

Abstract

Multi-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but our embodied human intelligence evolved with sensory input from the egocentric, first-person perspective. Towards embodied AI, we introduce the Egocentric Communications (EgoCom) dataset to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning. EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. EgoCom includes 38.5 hours of synchronized embodied stereo audio, egocentric video with 240,000 ground-truth, time-stamped word-level transcriptions and speaker labels from 34 diverse speakers. We study baseline performance on two novel applications that benefit from embodied data: (1) predicting turn-taking in conversations and (2) multi-speaker transcription. For (1), we investigate Bayesian baselines to predict turn-taking within 5 percent of human performance. For (2), we use simultaneous egocentric capture to combine Google speech-to-text outputs, improving global transcription by 79 percent relative to a single perspective. Both applications exploit EgoCom's synchronous multi-perspective data to augment performance of embodied AI tasks.

摘要

人工智能 (AI) 中的多模态数据集通常捕捉的是第三人称视角,但我们具有身体的人类智能是在以自我为中心的第一人称视角的感官输入中进化而来的。为了实现具身人工智能,我们引入了自我中心通信 (EgoCom) 数据集,以推动对话式人工智能、自然语言、音频语音分析、计算机视觉和机器学习领域的最新技术。EgoCom 是一个首创的自然对话数据集,包含了从参与者自我中心视角同时捕获的多模态人类交流数据。EgoCom 包含 38.5 小时的同步具身立体声音频、带有 240,000 个带有时间戳的单词级转录和说话人标签的自我中心视频,这些标签来自 34 个不同的说话人。我们研究了两个受益于具身数据的新型应用的基线性能:(1) 预测对话中的轮次转换,以及 (2) 多说话人转录。对于 (1),我们研究了贝叶斯基线,以在 5%的人类表现范围内预测轮次转换。对于 (2),我们使用同步自我中心捕获来结合 Google 语音转文本输出,与单视角相比,全局转录提高了 79%。这两个应用都利用了 EgoCom 的同步多视角数据来提高具身人工智能任务的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验