Suh Jae Won, Saunders Rob, Simes Elizabeth, Delamain Henry, Butler Stephen, Cottrell David, Kraam Abdullah, Scott Stephen, Goodyer Ian M, Wason James, Pilling Stephen, Fonagy Peter
CORE Data Lab, Centre for Outcomes Research and Effectiveness, Research Department of Clinical, Educational and Health Psychology, University College London, London, UK.
Research Department of Clinical, Educational and Health Psychology, University College London, London, UK.
Eur Child Adolesc Psychiatry. 2025 May;34(5):1579-1588. doi: 10.1007/s00787-024-02592-7. Epub 2024 Oct 8.
Accurate prediction of short-term offending in young people exhibiting antisocial behaviour could support targeted interventions. Here we develop a set of machine learning (ML) models that predict offending status with good accuracy; furthermore, we show interpretable ML analyses can complement models to inform clinical decision-making.
This study included 679 individuals aged 11-17 years who displayed moderate-to-severe antisocial behaviour, from a controlled trial of Multisystemic-therapy in England. The outcome was any criminal offence in the 18 months after study baseline. Four types of ML algorithms were trained: logistic regression, elastic net regression, random forest, and gradient boosting machine (GBM). Prediction models were developed (1) using predictors readily available to clinicians (e.g. sociodemographics, previous convictions), and (2) with additional information (e.g. parenting). Model agnostic feature importance values were calculated and the most important predictors identified. Nested cross-validation with 100 iterations of random data splits and 10-fold cross-validation within each iteration was employed, and the average predictive performance was reported.
Among the ML models using readily available predictors, the GBM is the strongest model (AUC 0.85, 95% CI 0.85-0.86); the other models have average AUCs of 0.82. This performance was better than using only the total number of previous offences as the predictor (0.67, 0.66-0.68), and the model simply assuming past offending status as the prediction (0.81, 0.80-0.81). Additional predictors slightly increased the performance of logistic regression and random forest models but decreased the performance of elastic net regression and gradient boosting machine-based models.
The potential utility of ML approaches for accurately predicting criminal offences in high-risk youth is demonstrated. Interpretable ML-based predictive models could be utilised in youth services or research to help develop and deliver effective interventions.
准确预测表现出反社会行为的年轻人的短期犯罪行为,有助于实施有针对性的干预措施。在此,我们开发了一组机器学习(ML)模型,这些模型能以较高的准确率预测犯罪状态;此外,我们还表明,可解释的ML分析能够补充模型,为临床决策提供依据。
本研究纳入了679名年龄在11至17岁之间、表现出中度至重度反社会行为的个体,这些个体来自英国多系统疗法的一项对照试验。研究结果为研究基线后18个月内的任何刑事犯罪。训练了四种类型的ML算法:逻辑回归、弹性网络回归、随机森林和梯度提升机(GBM)。开发预测模型时,(1)使用临床医生容易获得的预测因素(如社会人口统计学、既往定罪情况),(2)结合额外信息(如育儿情况)。计算了与模型无关的特征重要性值,并确定了最重要的预测因素。采用了100次随机数据分割的嵌套交叉验证以及每次迭代中的10折交叉验证,并报告了平均预测性能。
在使用易于获得的预测因素的ML模型中,GBM是最强的模型(曲线下面积[AUC]为0.85,95%置信区间[CI]为0.85 - 0.86);其他模型的平均AUC为0.82。这一性能优于仅使用既往犯罪总数作为预测因素的情况(0.67,0.66 - 0.68),以及简单假设过去犯罪状态作为预测的模型(0.81,0.80 - 0.81)。额外的预测因素略微提高了逻辑回归和随机森林模型的性能,但降低了弹性网络回归和基于梯度提升机的模型的性能。
证明了ML方法在准确预测高危青少年犯罪方面的潜在效用。基于ML的可解释预测模型可用于青少年服务或研究,以帮助制定和实施有效的干预措施。