Departments of Epidemiology and Biomedical Data Science Dartmouth Geisel School of Medicine Hanover NH.
Department of Biomedical Informatics Vanderbilt University Medical Center Nashville TN.
J Am Heart Assoc. 2022 Apr 5;11(7):e024198. doi: 10.1161/JAHA.121.024198. Epub 2022 Mar 24.
Background Social risk factors influence rehospitalization rates yet are challenging to incorporate into prediction models. Integration of social risk factors using natural language processing (NLP) and machine learning could improve risk prediction of 30-day readmission following an acute myocardial infarction. Methods and Results Patients were enrolled into derivation and validation cohorts. The derivation cohort included inpatient discharges from Vanderbilt University Medical Center between January 1, 2007, and December 31, 2016, with a primary diagnosis of acute myocardial infarction, who were discharged alive, and not transferred from another facility. The validation cohort included patients from Dartmouth-Hitchcock Health Center between April 2, 2011, and December 31, 2016, meeting the same eligibility criteria described above. Data from both sites were linked to Centers for Medicare & Medicaid Services administrative data to supplement 30-day hospital readmissions. Clinical notes from each cohort were extracted, and an NLP model was deployed, counting mentions of 7 social risk factors. Five machine learning models were run using clinical and NLP-derived variables. Model discrimination and calibration were assessed, and receiver operating characteristic comparison analyses were performed. The 30-day rehospitalization rates among the derivation (n=6165) and validation (n=4024) cohorts were 15.1% (n=934) and 10.2% (n=412), respectively. The derivation models demonstrated no statistical improvement in model performance with the addition of the selected NLP-derived social risk factors. Conclusions Social risk factors extracted using NLP did not significantly improve 30-day readmission prediction among hospitalized patients with acute myocardial infarction. Alternative methods are needed to capture social risk factors.
社会风险因素会影响再住院率,但将其纳入预测模型具有挑战性。使用自然语言处理(NLP)和机器学习整合社会风险因素可以提高急性心肌梗死 30 天内再入院风险预测。
患者被纳入推导和验证队列。推导队列包括 2007 年 1 月 1 日至 2016 年 12 月 31 日期间范德比尔特大学医学中心的住院患者出院记录,主要诊断为急性心肌梗死,出院时存活,并且没有从其他机构转院。验证队列包括 2011 年 4 月 2 日至 2016 年 12 月 31 日期间达特茅斯-希区柯克卫生中心的患者,符合上述相同的入选标准。两个地点的数据都与医疗保险和医疗补助服务中心的行政数据相关联,以补充 30 天的医院再入院情况。从每个队列中提取临床记录,并部署了一个 NLP 模型,计算了 7 种社会风险因素的提及次数。使用临床和 NLP 衍生变量运行了 5 个机器学习模型。评估了模型的区分度和校准度,并进行了接收者操作特征比较分析。推导(n=6165)和验证(n=4024)队列的 30 天再住院率分别为 15.1%(n=934)和 10.2%(n=412)。推导模型中加入选定的 NLP 衍生社会风险因素后,模型性能没有显著提高。
使用 NLP 提取的社会风险因素并未显著改善住院急性心肌梗死患者的 30 天再入院预测。需要替代方法来捕捉社会风险因素。