Department of Business Analytics, University of Iowa, Iowa City, USA.
Interdisciplinary Graduate Program in Informatics, University of Iowa, Iowa City, USA.
BMC Med Inform Decis Mak. 2022 Apr 29;22(1):115. doi: 10.1186/s12911-022-01854-1.
While multiple randomized controlled trials (RCTs) are available, their results may not be generalizable to older, unhealthier or less-adherent patients. Observational data can be used to predict outcomes and evaluate treatments; however, exactly which strategy should be used to analyze the outcomes of treatment using observational data is currently unclear. This study aimed to determine the most accurate machine learning technique to predict 1-year-after-initial-acute-myocardial-infarction (AMI) survival of elderly patients and to identify the association of angiotensin-converting- enzyme inhibitors and angiotensin-receptor blockers (ACEi/ARBs) with survival.
We built a cohort of 124,031 Medicare beneficiaries who experienced an AMI in 2007 or 2008. For analytical purposes, all variables were categorized into nine different groups: ACEi/ARB use, demographics, cardiac events, comorbidities, complications, procedures, medications, insurance, and healthcare utilization. Our outcome of interest was 1-year-post-AMI survival. To solve this classification task, we used lasso logistic regression (LLR) and random forest (RF), and compared their performance depending on category selection, sampling methods, and hyper-parameter selection. Nested 10-fold cross-validation was implemented to obtain an unbiased estimate of performance evaluation. We used the area under the receiver operating curve (AUC) as our primary measure for evaluating the performance of predictive algorithms.
LLR consistently showed best AUC results throughout the experiments, closely followed by RF. The best prediction was yielded with LLR based on the combination of demographics, comorbidities, procedures, and utilization. The coefficients from the final LLR model showed that AMI patients with many comorbidities, older ages, or living in a low-income area have a higher risk of mortality 1-year after an AMI. In addition, treating the AMI patients with ACEi/ARBs increases the 1-year-after-initial-AMI survival rate of the patients.
Given the many features we examined, ACEi/ARBs were associated with increased 1-year survival among elderly patients after an AMI. We found LLR to be the best-performing model over RF to predict 1-year survival after an AMI. LLR greatly improved the generalization of the model by feature selection, which implicitly indicates the association between AMI-related variables and survival can be defined by a relatively simple model with a small number of features. Some comorbidities were associated with a greater risk of mortality, such as heart failure and chronic kidney disease, but others were associated with survival such as hypertension, hyperlipidemia, and diabetes. In addition, patients who live in urban areas and areas with large numbers of immigrants have a higher probability of survival. Machine learning methods are helpful to determine outcomes when RCT results are not available.
虽然有多项随机对照试验(RCT)可用,但它们的结果可能无法推广到年龄较大、身体状况较差或依从性较低的患者。观察性数据可用于预测结果和评估治疗方法;然而,目前尚不清楚应该使用哪种策略来使用观察性数据分析治疗结果。本研究旨在确定最准确的机器学习技术来预测老年人初次急性心肌梗死(AMI)后 1 年的生存情况,并确定血管紧张素转换酶抑制剂和血管紧张素受体阻滞剂(ACEi/ARB)与生存的关系。
我们构建了一个由 124031 名在 2007 年或 2008 年经历 AMI 的 Medicare 受益人的队列。出于分析目的,所有变量都分为九组:ACEi/ARB 使用、人口统计学、心脏事件、合并症、并发症、手术、药物、保险和医疗保健利用。我们感兴趣的结果是 AMI 后 1 年的生存。为了解决这个分类任务,我们使用了套索逻辑回归(LLR)和随机森林(RF),并根据类别选择、采样方法和超参数选择比较了它们的性能。实施嵌套的 10 折交叉验证以获得性能评估的无偏估计。我们使用接收者操作特征曲线(AUC)下的面积作为我们评估预测算法性能的主要指标。
在整个实验过程中,LLR 始终表现出最佳的 AUC 结果,紧随其后的是 RF。基于人口统计学、合并症、手术和利用的 LLR 组合产生了最佳预测。来自最终 LLR 模型的系数表明,患有多种合并症、年龄较大或居住在低收入地区的 AMI 患者在 AMI 后 1 年内死亡的风险更高。此外,用 ACEi/ARB 治疗 AMI 患者会增加患者的初始 AMI 后 1 年生存率。
鉴于我们检查了许多特征,ACEi/ARB 与 AMI 后老年患者的 1 年生存率增加有关。与 RF 相比,我们发现 LLR 是预测 AMI 后 1 年生存率的表现最佳的模型。通过特征选择,LLR 极大地提高了模型的泛化能力,这暗示了与 AMI 相关的变量与生存之间的关系可以用具有少数特征的相对简单的模型来定义。一些合并症与更高的死亡率相关,例如心力衰竭和慢性肾脏病,但其他合并症与生存相关,例如高血压、高血脂和糖尿病。此外,居住在城市地区和移民人数较多地区的患者具有更高的生存概率。当 RCT 结果不可用时,机器学习方法有助于确定结果。