Rossegger Astrid, Endrass Jérôme, Gerth Juliane, Singh Jay P
Department of Mental Health Services, Office of Corrections, Canton of Zurich, Zurich, Switzerland; Department of Psychology, University of Konstanz, Konstanz, Germany.
Department of Mental Health Services, Office of Corrections, Canton of Zurich, Zurich, Switzerland; Institute of Health Sciences, Molde University College, Molde, Norway.
PLoS One. 2014 Mar 14;9(3):e91845. doi: 10.1371/journal.pone.0091845. eCollection 2014.
The performance of violence risk assessment instruments can be primarily investigated by analysing two psychometric properties: discrimination and calibration. Although many studies have examined the discrimination capacity of the Violence Risk Appraisal Guide (VRAG) and other actuarial risk assessment tools, few have evaluated how well calibrated these instruments are. The aim of the present investigation was to replicate the development study of the VRAG in Europe including measurements of discrimination and calibration.
Using a prospective study design, we assessed a total cohort of violent offenders in the Zurich Canton of Switzerland using the VRAG prior to discharge from prisons, secure facilities, and outpatient clinics. Assessors adhered strictly to the assessment protocol set out in the instrument's manual. After controlling for attrition, 206 offenders were followed in the community for a fixed period of 7 years. We used charges and convictions for subsequent violent offenses as the outcomes. Receiver operating characteristic analysis was conducted to measure discrimination, and Sanders' decomposition of the Brier score as well as Bayesian credible intervals were calculated to measure calibration.
The discrimination of the VRAG's risk bins was modest (area under the curve = 0.72, 95% CI = 0.63-0.81, p<0.05). However, the calibration of the tool was poor, with Sanders' calibration score suggesting an average assessment error of 21% in the probabilistic estimates associated with each bin. The Bayesian credible intervals revealed that in five out of nine risk bins the intervals did not contain the expected risk rates.
Measurement of the calibration validity of risk assessment instruments needs to be improved, as has been done with respect to discrimination. Additional replication studies that focus on the calibration of actuarial risk assessment instruments are needed. Meanwhile, we recommend caution when using the VRAG probabilistic risk estimates in practice.
暴力风险评估工具的性能主要可通过分析两种心理测量特性来研究:区分能力和校准度。尽管许多研究已考察了暴力风险评估指南(VRAG)及其他精算风险评估工具的区分能力,但很少有研究评估这些工具的校准度如何。本研究的目的是在欧洲重复VRAG的开发研究,包括对区分能力和校准度的测量。
采用前瞻性研究设计,我们在瑞士苏黎世州对一批暴力罪犯进行评估,在他们从监狱、安全设施和门诊诊所出院前使用VRAG。评估者严格遵守该工具手册中规定的评估方案。在控制失访情况后,对206名罪犯在社区进行了为期7年的固定随访。我们将后续暴力犯罪的指控和定罪作为结果。进行接受者操作特征分析以测量区分能力,并计算桑德斯对布里尔分数的分解以及贝叶斯可信区间以测量校准度。
VRAG风险等级的区分能力一般(曲线下面积 = 0.72,95%置信区间 = 0.63 - 0.81,p < 0.05)。然而,该工具的校准度较差,桑德斯校准分数表明与每个等级相关的概率估计中平均评估误差为21%。贝叶斯可信区间显示,在九个风险等级中的五个等级,区间未包含预期风险率。
风险评估工具校准效度的测量需要改进,就像在区分能力方面所做的那样。需要有更多专注于精算风险评估工具校准的重复研究。同时,我们建议在实践中使用VRAG概率风险估计时要谨慎。