Ainiwaer Aikeliyaer, Hou Wen Qing, Kadier Kaisaierjiang, Rehemuding Rena, Liu Peng Fei, Maimaiti Halimulati, Qin Lian, Ma Xiang, Dai Jian Guo
Department of Cardiology, The First Affiliated Hospital of Xinjiang Medical University, 830011 Urumqi, Xinjiang, China.
College of Information Science and Technology, Shihezi University, 832003 Shihezi, Xinjiang, China.
Rev Cardiovasc Med. 2023 Jun 8;24(6):168. doi: 10.31083/j.rcm2406168. eCollection 2023 Jun.
Although machine learning (ML)-based prediction of coronary artery disease (CAD) has gained increasing attention, assessment of the severity of suspected CAD in symptomatic patients remains challenging.
The training set for this study consisted of 284 retrospective participants, while the test set included 116 prospectively enrolled participants from whom we collected 53 baseline variables and coronary angiography results. The data was pre-processed with outlier processing and One-Hot coding. In the first stage, we constructed a ML model that used baseline information to predict the presence of CAD with a dichotomous model. In the second stage, baseline information was used to construct ML regression models for predicting the severity of CAD. The non-CAD population was included, and two different scores were used as output variables. Finally, statistical analysis and SHAP plot visualization methods were employed to explore the relationship between baseline information and CAD.
The study included 269 CAD patients and 131 healthy controls. The eXtreme Gradient Boosting (XGBoost) model exhibited the best performance amongst the different models for predicting CAD, with an area under the receiver operating characteristic curve of 0.728 (95% CI 0.623-0.824). The main correlates were left ventricular ejection fraction, homocysteine, and hemoglobin ( 0.001). The XGBoost model performed best for predicting the SYNTAX score, with the main correlates being brain natriuretic peptide (BNP), left ventricular ejection fraction, and glycated hemoglobin ( 0.001). The main relevant features in the model predictive for the GENSINI score were BNP, high density lipoprotein, and homocysteine ( 0.001).
This data-driven approach provides a foundation for the risk stratification and severity assessment of CAD.
The study was registered in www.clinicaltrials.gov protocol registration system (number NCT05018715).
尽管基于机器学习(ML)的冠状动脉疾病(CAD)预测已受到越来越多的关注,但对有症状患者疑似CAD的严重程度进行评估仍具有挑战性。
本研究的训练集包括284名回顾性参与者,而测试集包括116名前瞻性招募的参与者,我们从他们那里收集了53个基线变量和冠状动脉造影结果。数据经过异常值处理和独热编码进行预处理。在第一阶段,我们构建了一个ML模型,该模型使用基线信息通过二分模型预测CAD的存在。在第二阶段,基线信息被用于构建ML回归模型以预测CAD的严重程度。纳入了非CAD人群,并将两个不同的评分用作输出变量。最后,采用统计分析和SHAP图可视化方法来探索基线信息与CAD之间的关系。
该研究纳入了269名CAD患者和131名健康对照。在预测CAD的不同模型中,极端梯度提升(XGBoost)模型表现最佳,受试者工作特征曲线下面积为0.728(95%CI 0.623 - 0.824)。主要相关因素为左心室射血分数、同型半胱氨酸和血红蛋白(P < 0.001)。XGBoost模型在预测SYNTAX评分方面表现最佳,主要相关因素为脑钠肽(BNP)、左心室射血分数和糖化血红蛋白(P < 0.001)。预测GENSINI评分的模型中的主要相关特征为BNP、高密度脂蛋白和同型半胱氨酸(P < 0.001)。
这种数据驱动的方法为CAD的风险分层和严重程度评估提供了基础。