Santos Henrique, Mulvehill Alice M, Shen Ke, Kejriwal Mayank, McGuinness Deborah L
Rensselaer Polytechnic Institute 110 8th St., Troy, NY 12180, USA.
University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey CA, 90292, USA.
Data Brief. 2023 Oct 11;51:109666. doi: 10.1016/j.dib.2023.109666. eCollection 2023 Dec.
Machine Common Sense Reasoning is the subfield of Artificial Intelligence that aims to enable machines to behave or make decisions similarly to humans in everyday and ordinary situations. To measure progress, benchmarks in the form of question-answering datasets have been developed and published in the community to evaluate machine commonsense models, including large language models. We describe the individual label data produced by six human annotators originally used in computing ground truth for the Theoretically-Grounded Commonsense Reasoning (TG-CSR) benchmark's composing datasets. According to a set of instructions, annotators were provided with spreadsheets containing the original TG-CSR prompts and asked to insert labels in specific spreadsheet cells during annotation sessions. TG-CSR data is organized in JSON files, individual raw label data in a spreadsheet file, and individual normalized label data in JSONL files. The release of individual labels can enable the analysis of the labeling process itself, including studies of noise and consistency across annotators.
机器常识推理是人工智能的一个子领域,旨在使机器在日常和普通情况下的行为或决策方式与人类相似。为了衡量进展,以问答数据集形式的基准已经在社区中开发并发布,用于评估机器常识模型,包括大语言模型。我们描述了最初用于计算理论基础常识推理(TG-CSR)基准组成数据集的地面真值的六名人类注释者产生的个体标签数据。根据一组说明,向注释者提供了包含原始TG-CSR提示的电子表格,并要求他们在注释会话期间在特定的电子表格单元格中插入标签。TG-CSR数据以JSON文件形式组织,个体原始标签数据存储在电子表格文件中,个体标准化标签数据存储在JSONL文件中。个体标签的发布可以对标签过程本身进行分析,包括对注释者之间的噪声和一致性的研究。