Raccah Omri, Chen Phoebe, Gureckis Todd M, Poeppel David, Vo Vy A
Department of Psychology, Yale University, New Haven, CT, USA.
Department of Psychology, New York University, New York, NY, USA.
Sci Data. 2024 Dec 3;11(1):1317. doi: 10.1038/s41597-024-04082-6.
The "Naturalistic Free Recall" dataset provides transcribed verbal recollections of four spoken narratives collected from 229 participants. Each participant listened to two stories, varying in duration from approximately 8 to 13 minutes, recorded by different speakers. Subsequently, participants were tasked with verbally recalling the narrative content in as much detail as possible and in the correct order. The dataset includes high-fidelity, time-stamped text transcripts of both the original narratives and participants' recollections. To validate the dataset, we apply a previously published automated method to score memory performance for narrative content. Using this approach, we extend effects traditionally observed in classic list-learning paradigms. The analysis of narrative contents and its verbal recollection presents unique challenges compared to controlled list-learning experiments. To facilitate the use of these rich data by the community, we offer an overview of recent computational methods that can be used to annotate and evaluate key properties of narratives and their recollections. Using advancements in machine learning and natural language processing, these methods can help the community understand the role of event structure, discourse properties, prediction error, high-level semantic features (e.g., idioms, humor), and more. All experimental materials, code, and data are publicly available to facilitate new advances in understanding human memory.
“自然主义自由回忆”数据集提供了从229名参与者那里收集到的四段口头叙述的转录文字记录。每位参与者听了两个故事,时长约8至13分钟,由不同的讲述者录制。随后,参与者的任务是尽可能详细且按正确顺序口头回忆叙述内容。该数据集包括原始叙述和参与者回忆的高保真、带时间戳的文字记录。为了验证该数据集,我们应用一种先前发表的自动化方法来对叙述内容的记忆表现进行评分。使用这种方法,我们扩展了传统上在经典列表学习范式中观察到的效果。与受控的列表学习实验相比,对叙述内容及其口头回忆的分析提出了独特的挑战。为了便于社区使用这些丰富的数据,我们概述了最近可用于注释和评估叙述及其回忆的关键属性的计算方法。利用机器学习和自然语言处理方面的进展,这些方法可以帮助社区理解事件结构、语篇属性、预测误差、高级语义特征(如习语、幽默)等的作用。所有实验材料、代码和数据均公开可用,以促进在理解人类记忆方面取得新进展。