School of Business, State University of New York at New Paltz, New Paltz, NY, USA.
Department of Computer Science, Northern Kentucky University, Highland Heights, Kentucky, USA.
Int J Med Inform. 2022 Jul;163:104786. doi: 10.1016/j.ijmedinf.2022.104786. Epub 2022 Apr 29.
The ACC/AHA Pooled Cohort Equations (PCE) Risk Calculator is widely used in the US for primary prevention of atherosclerotic cardiovascular disease (ASCVD), but may under- or over-estimate risk in some populations. We therefore designed an automated, population-specific ASCVD risk calculator using machine-learning (ML) methods and electronic medical record (EMR) data, and compared its predictive power with that of the PCE calculator.
We collected data from 101,110 unique EMRs of living patients from January 1, 2009 to April 30, 2020. ML techniques were applied to patient datasets that included either only cross-sectional (CS) features, or CS combined with longitudinal (LT) features derived from vital statistics and laboratory values. We compared the utility of the models using a proposed new cost measure (Screened Cases Percentage @ Sensitivity level). All ML models tested achieved better predictive power than the PCE risk calculator. The random forest ML technique (RF) applied on the combination of CS and LT features (RF-LTC) produced the best area under curve (AUC) score of 0.902 (95% confidence interval (CI), 0.895-0.910). To detect 90% of all positive ASCVD cases, the best ML model required screening only 43% of patients, while the PCE risk calculator required screening 69% of patients.
Prediction models built using ML techniques improved ASCVD prediction and reduced the number of screenings required to predict ASCVD when compared with the PCE calculator, alone. Combining LT and CS features in the ML models significantly improved ASCVD prediction compared with using CS features, alone.
ACC/AHA 队列方程(PCE)风险计算器在美国被广泛用于动脉粥样硬化性心血管疾病(ASCVD)的一级预防,但在某些人群中可能会低估或高估风险。因此,我们使用机器学习(ML)方法和电子病历(EMR)数据设计了一种自动化的、特定人群的 ASCVD 风险计算器,并将其预测能力与 PCE 计算器进行了比较。
我们从 2009 年 1 月 1 日至 2020 年 4 月 30 日期间收集了来自 101,110 个独特的 EMR 的活患者数据。ML 技术应用于包含仅横断面(CS)特征或 CS 与来自生命统计和实验室值的纵向(LT)特征相结合的患者数据集。我们使用一种新的成本度量(灵敏度水平下的筛查病例百分比)来比较模型的效用。所有测试的 ML 模型都比 PCE 风险计算器具有更好的预测能力。随机森林 ML 技术(RF)应用于 CS 和 LT 特征的组合(RF-LTC)产生了最佳的曲线下面积(AUC)评分 0.902(95%置信区间(CI),0.895-0.910)。为了检测所有阳性 ASCVD 病例的 90%,最佳 ML 模型只需筛查 43%的患者,而 PCE 风险计算器则需要筛查 69%的患者。
与单独使用 PCE 计算器相比,使用 ML 技术构建的预测模型可提高 ASCVD 的预测能力,并减少预测 ASCVD 所需的筛查数量。与仅使用 CS 特征相比,在 ML 模型中结合 LT 和 CS 特征可显著提高 ASCVD 的预测能力。