Lindberg Daniel Martin, Lindsell Christopher John, Shapiro Robert Allan
Department of Emergency Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA.
Pediatrics. 2008 Apr;121(4):e945-53. doi: 10.1542/peds.2007-2485.
In the absence of a gold standard, clinicians and researchers often categorize their opinions of the likelihood of inflicted injury using several ordinal scales. The objective of this protocol was to determine the reliability of expert ratings using several of these scales.
Participants were pediatricians with substantial academic and clinical activity in the evaluation of children with concerns for physical abuse. The facts from several cases that were referred to 1 hospital's child abuse team were abstracted and recorded as in a multidisciplinary team conference. Participants viewed the recording and rated each case using several scales of child abuse likelihood.
Participants (n = 22) showed broad variability for most cases on all scales. Variability was lowest for cases with the highest aggregate concern for abuse. One scale that included examples of cases fitting each category and standard reporting language to summarize results showed a modest (18%-23%) decrease in variability among participants. The interpretation of the categories used by the scales was more consistent. Cases were rarely rated as "definite abuse" when likelihood was estimated at < or = 95%. Only 7 of 156 cases rated < or = 15% likelihood were rated as "no reasonable concern for abuse." Only 9 of 858 cases rated > or = 35% likelihood were rated as "reasonable concern for abuse."
Assessments of child abuse likelihood often show broad variability between experts. Although a rating scale with patient examples and standard reporting language may decrease variability, clinicians and researchers should be cautious when interpreting abuse likelihood assessments from a single expert. These data support the peer-review or multidisciplinary team approach to child abuse assessments.
在缺乏金标准的情况下,临床医生和研究人员常使用几种有序量表来对他们关于受虐可能性的看法进行分类。本方案的目的是确定使用其中几种量表时专家评级的可靠性。
参与者为在评估受身体虐待疑虑儿童方面有大量学术和临床活动的儿科医生。从转介到一家医院儿童虐待团队的几个案例的事实被提取出来,并在多学科团队会议中进行记录。参与者观看记录,并使用几种儿童虐待可能性量表对每个案例进行评级。
参与者(n = 22)在所有量表上对大多数案例的评级显示出很大的变异性。对于虐待总体关注度最高的案例,变异性最低。一种包含适合每个类别的案例示例和用于总结结果的标准报告语言的量表,参与者之间的变异性有适度(18%-23%)的降低。量表所使用类别的解释更一致。当可能性估计为≤95%时,案例很少被评为“肯定虐待”。在156个可能性评级为≤15%的案例中,只有7个被评为“无虐待合理疑虑”。在858个可能性评级为≥35%的案例中,只有9个被评为“有虐待合理疑虑”。
对儿童虐待可能性的评估在专家之间往往显示出很大的变异性。尽管带有患者示例和标准报告语言的评级量表可能会降低变异性,但临床医生和研究人员在解释来自单一专家的虐待可能性评估时应谨慎。这些数据支持对儿童虐待评估采用同行评审或多学科团队方法。