Ross Elsie Gyang, Jung Kenneth, Dudley Joel T, Li Li, Leeper Nicholas J, Shah Nigam H
Division of Vascular Surgery (E.G.R., N.J.L.), Stanford University School of Medicine, Stanford, CA.
Center for Biomedical Informatics Research (K.J., N.H.S., E.G.R), Stanford University School of Medicine, Stanford, CA.
Circ Cardiovasc Qual Outcomes. 2019 Mar;12(3):e004741. doi: 10.1161/CIRCOUTCOMES.118.004741.
Patients with peripheral artery disease (PAD) are at risk of major adverse cardiac and cerebrovascular events. There are no readily available risk scores that can accurately identify which patients are most likely to sustain an event, making it difficult to identify those who might benefit from more aggressive intervention. Thus, we aimed to develop a novel predictive model-using machine learning methods on electronic health record data-to identify which PAD patients are most likely to develop major adverse cardiac and cerebrovascular events.
Data were derived from patients diagnosed with PAD at 2 tertiary care institutions. Predictive models were built using a common data model that allowed for utilization of both structured (coded) and unstructured (text) data. Only data from time of entry into the health system up to PAD diagnosis were used for modeling. Models were developed and tested using nested cross-validation. A total of 7686 patients were included in learning our predictive models. Utilizing almost 1000 variables, our best predictive model accurately determined which PAD patients would go on to develop major adverse cardiac and cerebrovascular events with an area under the curve of 0.81 (95% CI, 0.80-0.83).
Machine learning algorithms applied to data in the electronic health record can learn models that accurately identify PAD patients at risk of future major adverse cardiac and cerebrovascular events, highlighting the great potential of electronic health records to provide automated risk stratification for cardiovascular diseases. Common data models that can enable cross-institution research and technology development could potentially be an important aspect of widespread adoption of newer risk-stratification models.
外周动脉疾病(PAD)患者有发生重大不良心脑血管事件的风险。目前尚无现成的风险评分能够准确识别哪些患者最有可能发生此类事件,这使得难以确定哪些患者可能从更积极的干预中获益。因此,我们旨在开发一种新型预测模型——利用电子健康记录数据中的机器学习方法——来识别哪些PAD患者最有可能发生重大不良心脑血管事件。
数据来源于两家三级医疗机构中被诊断为PAD的患者。使用通用数据模型构建预测模型,该模型允许利用结构化(编码)和非结构化(文本)数据。建模仅使用从进入医疗系统到PAD诊断期间的数据。模型通过嵌套交叉验证进行开发和测试。共有7686名患者被纳入以学习我们的预测模型。利用近1000个变量,我们的最佳预测模型能够准确判定哪些PAD患者会发生重大不良心脑血管事件,曲线下面积为0.81(95%CI,0.80 - 0.83)。
应用于电子健康记录数据的机器学习算法能够学习到准确识别未来有发生重大不良心脑血管事件风险的PAD患者的模型,这凸显了电子健康记录在为心血管疾病提供自动风险分层方面的巨大潜力。能够实现跨机构研究和技术开发的通用数据模型可能是广泛采用更新的风险分层模型的一个重要方面。