Levy Joshua, Dimambro Monica, Diallo Alos, Gui Jiang, Shiner Brian, Levis Maxwell
Department of Computational Biomedicine, Cedars Sinai Medical Center Los Angeles, CA, USA,
White River Junction VA Medical Center, White River Junction, VT, USA,
Pac Symp Biocomput. 2025;30:167-184. doi: 10.1142/9789819807024_0013.
Accurate prediction of suicide risk is crucial for identifying patients with elevated risk burden, helping ensure these patients receive targeted care. The US Department of Veteran Affairs' suicide prediction model primarily leverages structured electronic health records (EHR) data. This approach largely overlooks unstructured EHR, a data format that could be utilized to enhance predictive accuracy. This study aims to enhance suicide risk models' predictive accuracy by developing a model that incorporates both structured EHR predictors and semantic NLP-derived variables from unstructured EHR. XGBoost models were fit to predict suicide risk- the interactions identified by the model were extracted using SHAP, validated using logistic regression models, added to a ridge regression model, which was subsequently compared to a ridge regression approach without the use of interactions. By introducing a selection parameter, α, to balance the influence of structured (α=1) and unstructured (α=0) data, we found that intermediate α values achieved optimal performance across various risk strata, improved model performance of the ridge regression approach and uncovered significant cross-modal interactions between psychosocial constructs and patient characteristics. These interactions highlight how psychosocial risk factors are influenced by individual patient contexts, potentially informing improved risk prediction methods and personalized interventions. Our findings underscore the importance of incorporating nuanced narrative data into predictive models and set the stage for future research that will expand the use of advanced machine learning techniques, including deep learning, to further refine suicide risk prediction methods.
准确预测自杀风险对于识别风险负担较高的患者至关重要,有助于确保这些患者获得针对性的护理。美国退伍军人事务部的自杀预测模型主要利用结构化电子健康记录(EHR)数据。这种方法很大程度上忽略了非结构化EHR,而这种数据格式可用于提高预测准确性。本研究旨在通过开发一种结合结构化EHR预测因子和来自非结构化EHR的语义自然语言处理衍生变量的模型,来提高自杀风险模型的预测准确性。使用XGBoost模型来预测自杀风险——使用SHAP提取模型识别的交互作用,使用逻辑回归模型进行验证,将其添加到岭回归模型中,随后将其与不使用交互作用的岭回归方法进行比较。通过引入一个选择参数α来平衡结构化(α=1)和非结构化(α=0)数据的影响,我们发现中间的α值在各个风险分层中都实现了最佳性能,提高了岭回归方法的模型性能,并揭示了心理社会结构与患者特征之间显著的跨模态交互作用。这些交互作用突出了心理社会风险因素如何受到个体患者背景的影响,可能为改进风险预测方法和个性化干预提供依据。我们的数据强调了将细微的叙述性数据纳入预测模型的重要性,并为未来的研究奠定了基础,这些研究将扩大先进机器学习技术(包括深度学习)的应用,以进一步完善自杀风险预测方法。