Quinn Molly S, Ford Courtney, Keane Mark T
School of Computer Science, University College Dublin, Belfield, Dublin, Ireland.
ML-Labs, SFI Centre for Data Analytics Research Training in Machine Learning, University College Dublin, Belfield, Dublin, Ireland.
Data Brief. 2022 Aug 17;44:108545. doi: 10.1016/j.dib.2022.108545. eCollection 2022 Oct.
With this article, we present a repository containing datasets, analysis code, and some outputs related to a paper in press at . The data were collected as part of a pre-test, pilot test, and main study all designed in SurveyGizmo and participants recruited via Prolific.co (combined N=303). Datasets consist of raw and annotated data, where participant responses are free-text entries about what unexpected events might occur after a series of events, presented them with based on everyday scenarios. The code consists of all computational additions to the data, and analysis carried out for the results presented in the article. This data is released for the purpose of transparency and to allow for reproducability of the work. This human-labelled data should also be of use to machine learning researchers researching text analytics, natural language processing and sources of common-sense knowledge.
通过本文,我们展示了一个存储库,其中包含数据集、分析代码以及与一篇即将发表的论文相关的一些输出。这些数据是作为预测试、试点测试和主要研究的一部分收集的,所有这些测试均在SurveyGizmo中设计,并通过Prolific.co招募参与者(总计N = 303)。数据集由原始数据和注释数据组成,参与者的回答是关于在一系列基于日常场景呈现给他们的事件之后可能发生的意外事件的自由文本条目。代码包括对数据的所有计算添加内容以及为本文中呈现的结果所进行的分析。发布这些数据是为了提高透明度,并使这项工作具有可重复性。这些人工标注的数据对于研究文本分析、自然语言处理和常识知识来源的机器学习研究人员也应该是有用的。