Sheldrick R Christopher, Benneyan James C, Kiss Ivy Giserman, Briggs-Gowan Margaret J, Copeland William, Carter Alice S
Developmental-Behavioral Pediatrics, Tufts University School of Medicine, Boston, MA, USA.
Healthcare Systems Engineering Institute, Colleges of Engineering and Health Sciences, Northeastern University, Boston, MA, USA.
J Child Psychol Psychiatry. 2015 Sep;56(9):936-48. doi: 10.1111/jcpp.12442. Epub 2015 Jun 19.
The accuracy of any screening instrument designed to detect psychopathology among children is ideally assessed through rigorous comparison to 'gold standard' tests and interviews. Such comparisons typically yield estimates of what we refer to as 'standard indices of diagnostic accuracy', including sensitivity, specificity, positive predictive value (PPV), and negative predictive value. However, whereas these statistics were originally designed to detect binary signals (e.g., diagnosis present or absent), screening questionnaires commonly used in psychology, psychiatry, and pediatrics typically result in ordinal scores. Thus, a threshold or 'cut score' must be applied to these ordinal scores before accuracy can be evaluated using such standard indices. To better understand the tradeoffs inherent in choosing a particular threshold, we discuss the concept of 'threshold probability'. In contrast to PPV, which reflects the probability that a child whose score falls at or above the screening threshold has the condition of interest, threshold probability refers specifically to the likelihood that a child whose score is equal to a particular screening threshold has the condition of interest.
The diagnostic accuracy and threshold probability of two well-validated behavioral assessment instruments, the Child Behavior Checklist Total Problem Scale and the Strengths and Difficulties Questionnaire total scale were examined in relation to a structured psychiatric interview in three de-identified datasets.
Although both screening measures were effective in identifying groups of children at elevated risk for psychopathology in all samples (odds ratios ranged from 5.2 to 9.7), children who scored at or near the clinical thresholds that optimized sensitivity and specificity were unlikely to meet criteria for psychopathology on gold standard interviews.
Our results are consistent with the view that screening instruments should be interpreted probabilistically, with attention to where along the continuum of positive scores an individual falls.
旨在检测儿童精神病理学的任何筛查工具的准确性,理想情况下是通过与“金标准”测试和访谈进行严格比较来评估的。此类比较通常会得出我们所谓的“诊断准确性标准指标”的估计值,包括敏感性、特异性、阳性预测值(PPV)和阴性预测值。然而,虽然这些统计数据最初是为了检测二元信号(例如,诊断存在或不存在)而设计的,但心理学、精神病学和儿科学中常用的筛查问卷通常会得出序数分数。因此,在使用此类标准指标评估准确性之前,必须对这些序数分数应用一个阈值或“划界分数”。为了更好地理解选择特定阈值所固有的权衡,我们讨论“阈值概率”的概念。与反映分数落在或高于筛查阈值的儿童患有相关疾病的概率的PPV不同,阈值概率具体指分数等于特定筛查阈值的儿童患有相关疾病的可能性。
在三个匿名数据集中,将两种经过充分验证的行为评估工具——儿童行为检查表总问题量表和优势与困难问卷总量表的诊断准确性和阈值概率与结构化精神病学访谈进行了关联研究。
尽管两种筛查措施在所有样本中都有效地识别出了精神病理学风险较高的儿童群体(优势比范围为5.2至9.7),但在优化敏感性和特异性的临床阈值或接近该阈值得分的儿童,在金标准访谈中不太可能符合精神病理学标准。
我们的结果与以下观点一致,即筛查工具应以概率方式进行解释,要注意个体在阳性分数连续体上所处的位置。