Department of Computer Science, University of Memphis, 38152, Memphis, TN, USA.
Electrical and Computer Engineering Department, University of Southern California, 90089, Los Angeles, CA, USA.
Behav Res Methods. 2024 Dec;56(8):8784-8800. doi: 10.3758/s13428-024-02503-3. Epub 2024 Sep 30.
Accurately representing changes in mental states over time is crucial for understanding their complex dynamics. However, there is little methodological research on the validity and reliability of human-produced continuous-time annotation of these states. We present a psychometric perspective on valid and reliable construct assessment, examine the robustness of interval-scale (e.g., values between zero and one) continuous-time annotation, and identify three major threats to validity and reliability in current approaches. We then propose a novel ground truth generation pipeline that combines emerging techniques for improving validity and robustness. We demonstrate its effectiveness in a case study involving crowd-sourced annotation of perceived violence in movies, where our pipeline achieves a .95 Spearman correlation in summarized ratings compared to a .15 baseline. These results suggest that highly accurate ground truth signals can be produced from continuous annotations using additional comparative annotation (e.g., a versus b) to correct structured errors, highlighting the need for a paradigm shift in robust construct measurement over time.
准确地描述随时间变化的心理状态对于理解其复杂动态至关重要。然而,关于这些状态的人工连续时间注释的有效性和可靠性的方法学研究很少。我们从心理测量学的角度探讨了有效和可靠的结构评估,检验了区间尺度(例如,零到一之间的值)连续时间注释的稳健性,并确定了当前方法中三个主要的有效性和可靠性威胁。然后,我们提出了一种新颖的真实值生成管道,该管道结合了提高有效性和稳健性的新兴技术。我们在一项涉及电影中感知暴力的众包注释的案例研究中证明了其有效性,在该案例中,我们的管道在总结评分方面达到了与.15 的基线相比.95 的斯皮尔曼相关性。这些结果表明,可以使用额外的比较注释(例如,a 与 b)从连续注释中生成高度准确的真实值信号,以纠正结构化错误,这突出表明需要在随时间的稳健结构测量方面进行范式转变。