人们会犯错：从连续标注的主观构念中获取准确的真实数据。

People make mistakes: Obtaining accurate ground truth from continuous annotations of subjective constructs.

机构信息

Department of Computer Science, University of Memphis, 38152, Memphis, TN, USA.

Electrical and Computer Engineering Department, University of Southern California, 90089, Los Angeles, CA, USA.

出版信息

Behav Res Methods. 2024 Dec;56(8):8784-8800. doi: 10.3758/s13428-024-02503-3. Epub 2024 Sep 30.

DOI:10.3758/s13428-024-02503-3

PMID:39349847

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11525321/

Abstract

Accurately representing changes in mental states over time is crucial for understanding their complex dynamics. However, there is little methodological research on the validity and reliability of human-produced continuous-time annotation of these states. We present a psychometric perspective on valid and reliable construct assessment, examine the robustness of interval-scale (e.g., values between zero and one) continuous-time annotation, and identify three major threats to validity and reliability in current approaches. We then propose a novel ground truth generation pipeline that combines emerging techniques for improving validity and robustness. We demonstrate its effectiveness in a case study involving crowd-sourced annotation of perceived violence in movies, where our pipeline achieves a .95 Spearman correlation in summarized ratings compared to a .15 baseline. These results suggest that highly accurate ground truth signals can be produced from continuous annotations using additional comparative annotation (e.g., a versus b) to correct structured errors, highlighting the need for a paradigm shift in robust construct measurement over time.

摘要

准确地描述随时间变化的心理状态对于理解其复杂动态至关重要。然而，关于这些状态的人工连续时间注释的有效性和可靠性的方法学研究很少。我们从心理测量学的角度探讨了有效和可靠的结构评估，检验了区间尺度（例如，零到一之间的值）连续时间注释的稳健性，并确定了当前方法中三个主要的有效性和可靠性威胁。然后，我们提出了一种新颖的真实值生成管道，该管道结合了提高有效性和稳健性的新兴技术。我们在一项涉及电影中感知暴力的众包注释的案例研究中证明了其有效性，在该案例中，我们的管道在总结评分方面达到了与.15 的基线相比.95 的斯皮尔曼相关性。这些结果表明，可以使用额外的比较注释（例如，a 与 b）从连续注释中生成高度准确的真实值信号，以纠正结构化错误，这突出表明需要在随时间的稳健结构测量方面进行范式转变。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a38b/11525321/1c28b3e92ced/13428_2024_2503_Fig1_HTML.jpg

相似文献

People make mistakes: Obtaining accurate ground truth from continuous annotations of subjective constructs.人们会犯错：从连续标注的主观构念中获取准确的真实数据。

Behav Res Methods. 2024 Dec;56(8):8784-8800. doi: 10.3758/s13428-024-02503-3. Epub 2024 Sep 30.

On the objectivity, reliability, and validity of deep learning enabled bioimage analyses.深度学习赋能的生物影像分析的客观性、可靠性和有效性。

Elife. 2020 Oct 19;9:e59780. doi: 10.7554/eLife.59780.

Modeling multiple time series annotations as noisy distortions of the ground truth: An Expectation-Maximization approach.将多个时间序列注释建模为真实情况的噪声失真：一种期望最大化方法。

IEEE Trans Affect Comput. 2018 Jan-Mar;9(1):76-89. doi: 10.1109/TAFFC.2016.2592918. Epub 2016 Jul 19.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Psychometric properties of the revised Ukrainian version of the Continuous Traumatic Stress Response scale (CTSR) in the context of the Russo-Ukrainian war.在俄乌战争背景下，修订后的乌克兰语版持续创伤应激反应量表（CTSR）的心理测量学特性。

Eur J Psychotraumatol. 2025 Dec;16(1):2463186. doi: 10.1080/20008066.2025.2463186. Epub 2025 Feb 24.

Psychometric properties of the Management of Aggression and Violence Attitude Scale in Hong Kong's emergency care setting.香港急诊护理环境中攻击与暴力态度管理量表的心理测量特性。

Int Emerg Nurs. 2017 Mar;31:46-51. doi: 10.1016/j.ienj.2016.11.002. Epub 2016 Dec 12.

Designing and validation of a reproductive health need assessment tool for women experienced domestic violence.设计和验证一种针对经历过家庭暴力的女性的生殖健康需求评估工具。

Reprod Health. 2022 Jan 29;19(1):27. doi: 10.1186/s12978-022-01342-9.

[Psychometric characteristics of questionnaires designed to assess the knowledge, perceptions and practices of health care professionals with regards to alcoholic patients].[旨在评估医护人员对酒精依赖患者的知识、认知及实践情况的调查问卷的心理测量学特征]

Encephale. 2004 Sep-Oct;30(5):437-46. doi: 10.1016/s0013-7006(04)95458-9.

Development of a brief measure of intimate partner violence experiences: the Composite Abuse Scale (Revised)-Short Form (CASR-SF).亲密伴侣暴力经历简短测量工具的开发：综合虐待量表（修订版）-简表（CASR-SF）。

BMJ Open. 2016 Dec 7;6(12):e012824. doi: 10.1136/bmjopen-2016-012824.

Validity and reliability of exposure assessors' ratings of exposure intensity by type of occupational questionnaire and type of rater.暴露评估者根据职业问卷类型和评估者类型对暴露强度进行评级的有效性和可靠性。

Ann Occup Hyg. 2011 Jul;55(6):601-11. doi: 10.1093/annhyg/mer019. Epub 2011 Apr 21.

本文引用的文献

A dataset of continuous affect annotations and physiological signals for emotion analysis.用于情感分析的连续情感标注和生理信号数据集。

Sci Data. 2019 Oct 9;6(1):196. doi: 10.1038/s41597-019-0209-0.

SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild.SEWA DB：一个用于野外视听情感和情感研究的丰富数据库。

IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):1022-1040. doi: 10.1109/TPAMI.2019.2944808. Epub 2021 Feb 4.

Clustering algorithms: A comparative approach.聚类算法：一种比较方法。

PLoS One. 2019 Jan 15;14(1):e0210236. doi: 10.1371/journal.pone.0210236. eCollection 2019.

An Evaluation of EEG-based Metrics for Engagement Assessment of Distance Learners.基于脑电图的远程学习者参与度评估指标研究

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:307-310. doi: 10.1109/EMBC.2018.8512302.

IEEE Trans Affect Comput. 2018 Jan-Mar;9(1):76-89. doi: 10.1109/TAFFC.2016.2592918. Epub 2016 Jul 19.

Common pitfalls in statistical analysis: Measures of agreement.统计分析中的常见陷阱：一致性度量

Perspect Clin Res. 2017 Oct-Dec;8(4):187-191. doi: 10.4103/picr.PICR_123_17.

Developing a benchmark for emotional analysis of music.建立音乐情感分析的基准。

PLoS One. 2017 Mar 10;12(3):e0173392. doi: 10.1371/journal.pone.0173392. eCollection 2017.

A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.可靠性研究中组内相关系数选择与报告指南

J Chiropr Med. 2016 Jun;15(2):155-63. doi: 10.1016/j.jcm.2016.02.012. Epub 2016 Mar 31.

Generalized Canonical Time Warping.广义正则时间规整。

IEEE Trans Pattern Anal Mach Intell. 2016 Feb;38(2):279-94. doi: 10.1109/TPAMI.2015.2414429.

Dynamic Probabilistic CCA for Analysis of Affective Behavior and Fusion of Continuous Annotations.动态概率协方差分析在情感行为分析及连续标注融合中的应用。

IEEE Trans Pattern Anal Mach Intell. 2014 Jul;36(7):1299-311. doi: 10.1109/TPAMI.2014.16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人们会犯错：从连续标注的主观构念中获取准确的真实数据。

People make mistakes: Obtaining accurate ground truth from continuous annotations of subjective constructs.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献