Department of Counseling and Clinical Psychology.
Department of Psychology.
Psychol Assess. 2019 Aug;31(8):1052-1061. doi: 10.1037/pas0000723. Epub 2019 May 9.
Behavioral measures are increasingly used to assess suicidal thoughts and behaviors. Some measures, such as the Suicide Stroop Task, have yielded mixed findings in the literature. An understudied feature of these behavioral measures has been their psychometric properties, which may affect the probability of detecting significant effects and reproducibility. In the largest investigation of its kind, we tested the internal consistency and concurrent validity of the Suicide Stroop Task in its current form, drawing from seven separate studies ( = 875 participants, 64% female, aged 12 to 81 years). Results indicated that the most common Suicide Stroop scoring approach, interference scores, yielded unacceptably low internal consistency (s = -.09-.13) and failed to demonstrate concurrent validity. Internal consistency coefficients for mean reaction times (RTs) to each stimulus type ranged from s = .93-.94. All scoring approaches for suicide-related interference demonstrated poor classification accuracy (AUCs = .52-.56) indicating that scores performed near chance in their ability to classify suicide attempters from nonattempters. In the case of mean RTs, we did not find evidence for concurrent validity despite our excellent reliability findings, highlighting that reliability does not guarantee a measure is clinically useful. These results are discussed in the context of the wider implications for testing and reporting psychometric properties of behavioral measures in mental health research. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
行为测量越来越多地被用于评估自杀意念和行为。一些测量方法,如自杀 Stroop 任务,在文献中得出了混合的结果。这些行为测量方法中一个研究较少的特征是其心理测量特性,这可能会影响检测到显著效果和可重复性的概率。在同类研究中最大的一项调查中,我们测试了当前形式的自杀 Stroop 任务的内部一致性和同时效度,从七个独立的研究中抽取了 875 名参与者(64%为女性,年龄 12 至 81 岁)。结果表明,最常见的自杀 Stroop 评分方法,干扰分数,产生了不可接受的低内部一致性(s = -.09-.13),并且未能证明同时效度。每种刺激类型的平均反应时间(RT)的内部一致性系数范围为 s =.93-.94。所有与自杀相关的干扰的评分方法的分类准确性都很差(AUCs =.52-.56),表明分数在将自杀企图者与非企图者分类的能力方面接近机会。就平均 RT 而言,尽管我们有出色的可靠性发现,但我们没有发现同时效度的证据,这突出表明可靠性并不能保证测量方法在临床上是有用的。这些结果在更广泛的范围内讨论了对心理健康研究中行为测量的心理测量特性进行测试和报告的影响。(PsycINFO 数据库记录(c)2019 APA,保留所有权利)。