Department of Statistics, University of Connecticut, Storrs, Connecticut, United States of America.
Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States of America.
PLoS One. 2023 Apr 26;18(4):e0283595. doi: 10.1371/journal.pone.0283595. eCollection 2023.
Preventing suicide in US youth is of paramount concern, with rates increasing over 50% between 2007 and 2018. Statistical modeling using electronic health records may help identify at-risk youth before a suicide attempt. While electronic health records contain diagnostic information, which are known risk factors, they generally lack or poorly document social determinants (e.g., social support), which are also known risk factors. If statistical models are built incorporating not only diagnostic records, but also social determinants measures, additional at-risk youth may be identified before a suicide attempt.
Suicide attempts were predicted in hospitalized patients, ages 10-24, from the State of Connecticut's Hospital Inpatient Discharge Database (HIDD; N = 38943). Predictors included demographic information, diagnosis codes, and using a data fusion framework, social determinants features transferred or fused from an external source of survey data, The National Longitudinal Study of Adolescent to Adult Health (Add Health). Social determinant information for each HIDD patient was generated by averaging values from their most similar Add Health individuals (e.g., top 10), based upon matching shared features between datasets (e.g., Pearson's r). Attempts were then modelled using an elastic net logistic regression with both HIDD features and fused Add Health features.
The model including fused social determinants outperformed the conventional model (AUC = 0.83 v. 0.82). Sensitivity and positive predictive values at 90 and 95% specificity were almost 10% higher when including fused features (e.g., sensitivity at 90% specificity = 0.48 v. 0.44). Among social determinants variables, the perception that their mother cares and being non-religious appeared particularly important to performance improvement.
This proof-of-concept study showed that incorporating social determinants measures from an external survey database could improve prediction of youth suicide risk from clinical data using a data fusion framework. While social determinant data directly from patients might be ideal, estimating these characteristics via data fusion avoids the task of data collection, which is generally time-consuming, expensive, and suffers from non-compliance.
预防美国青少年自杀是当务之急,2007 年至 2018 年期间,自杀率上升了 50%以上。使用电子健康记录进行统计建模可能有助于在自杀未遂前识别高危青年。虽然电子健康记录包含诊断信息,这些信息是已知的风险因素,但它们通常缺乏或记录不佳社会决定因素(例如,社会支持),这些也是已知的风险因素。如果建立的统计模型不仅包含诊断记录,还包含社会决定因素措施,那么在自杀未遂之前,可能会发现更多的高危青年。
使用来自康涅狄格州医院住院患者数据库(HIDD;N=38943)的 10-24 岁住院患者预测自杀未遂。预测因子包括人口统计学信息、诊断代码和使用数据融合框架,从外部调查数据来源(国家青少年到成人健康纵向研究[Add Health])转移或融合的社会决定因素特征。为每个 HIDD 患者生成社会决定因素信息,方法是根据数据集之间共享特征(例如 Pearson r),从最相似的 Add Health 个体(例如,前 10 个)中平均值。然后使用弹性网逻辑回归模型对 HIDD 特征和融合的 Add Health 特征进行建模。
包括融合社会决定因素的模型表现优于传统模型(AUC=0.83 对 0.82)。当包括融合特征时,灵敏度和 90%和 95%特异性的阳性预测值几乎提高了 10%(例如,特异性为 90%时的灵敏度=0.48 对 0.44)。在社会决定因素变量中,认为母亲关心和非宗教信仰的感知对提高性能尤为重要。
本概念验证研究表明,使用数据融合框架从外部调查数据库中纳入社会决定因素措施可以提高使用临床数据预测青少年自杀风险的能力。虽然直接从患者获得社会决定因素数据可能是理想的,但通过数据融合来估计这些特征可以避免数据收集任务,因为数据收集通常耗时、昂贵且存在不遵守的问题。