Ross Elsie Gyang, Shah Nigam H, Dalman Ronald L, Nead Kevin T, Cooke John P, Leeper Nicholas J
Division of Vascular Surgery, Stanford Health Care, Stanford, Calif.
Center for Biomedical Informatics Research, Stanford University, Stanford, Calif.
J Vasc Surg. 2016 Nov;64(5):1515-1522.e3. doi: 10.1016/j.jvs.2016.04.026. Epub 2016 Jun 3.
A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses.
Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models.
Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates.
Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes.
精准医疗工作的一个关键方面是开发信息学工具,这些工具能够以自动化和自适应的方式分析和解释“大数据”集,同时提供准确且可操作的临床信息。本研究的目的是开发用于疾病识别和死亡风险预测的机器学习算法,并确定这些模型是否比经典统计分析表现更好。
以外周动脉疾病(PAD)为重点,患者数据来自一项对1755名因择期冠状动脉造影就诊患者的前瞻性观察研究。我们采用了多种监督式机器学习算法,并以无假设的方式使用各种临床、人口统计学、影像学和基因组信息来构建能够识别PAD患者并预测未来死亡率的模型。与标准逐步线性回归模型进行了比较。
我们的机器学习模型在识别PAD患者方面(曲线下面积分别为0.87和0.76;P = 0.03)以及预测未来死亡率方面(曲线下面积分别为0.76和0.65;P = 0.10)均优于逐步逻辑回归模型。两种机器学习模型的校准均明显优于逐步逻辑回归模型,从而提供了更准确的疾病和死亡风险估计。
机器学习方法可以产生更准确的疾病分类和预测模型。这些工具可能在临床上有助于自动识别患有高发病的患者,对于这些患者,积极的危险因素管理可以改善预后。