Kirdeev Alexander, Burkin Konstantin, Vorobev Anton, Zbirovskaya Elena, Lifshits Galina, Nikolaev Konstantin, Zelenskaya Elena, Donnikov Maxim, Kovalenko Lyudmila, Urvantseva Irina, Poptsova Maria
Faculty of Computer Science, AI and Digital Science Institute, International Laboratory of Bioinformatics, Higher School of Economics University, Moscow, Russia.
Department of Cardiology, Surgut State University, Surgut, Russia.
Front Med (Lausanne). 2024 Sep 5;11:1452239. doi: 10.3389/fmed.2024.1452239. eCollection 2024.
The development of prognostic models for the identification of high-risk myocardial infarction (MI) patients is a crucial step toward personalized medicine. Genetic factors are known to be associated with an increased risk of cardiovascular diseases; however, little is known about whether they can be used to predict major adverse cardiac events (MACEs) for MI patients. This study aimed to build a machine learning (ML) model to predict MACEs in MI patients based on clinical, imaging, laboratory, and genetic features and to assess the influence of genetics on the prognostic power of the model.
We analyzed the data from 218 MI patients admitted to the emergency department at the Surgut District Center for Diagnostics and Cardiovascular Surgery, Russia. Upon admission, standard clinical measurements and imaging data were collected for each patient. Additionally, patients were genotyped for VEGFR-2 variation rs2305948 (C/C, C/T, T/T genotypes with T being the minor risk allele). The study included a 9-year follow-up period during which major ischemic events were recorded. We trained and evaluated various ML models, including Gradient Boosting, Random Forest, Logistic Regression, and AutoML. For feature importance analysis, we applied the sequential feature selection (SFS) and Shapley's scheme of additive explanation (SHAP) methods.
The CatBoost algorithm, with features selected using the SFS method, showed the best performance on the test cohort, achieving a ROC AUC of 0.813. Feature importance analysis identified the dose of statins as the most important factor, with the VEGFR-2 genotype among the top 5. The other important features are coronary artery lesions (coronary artery stenoses ≥70%), left ventricular (LV) parameters such as lateral LV wall and LV mass, diabetes, type of revascularization (CABG or PCI), and age. We also showed that contributions are additive and that high risk can be determined by cumulative negative effects from different prognostic factors.
Our ML-based approach demonstrated that the VEGFR-2 genotype is associated with an increased risk of MACEs in MI patients. However, the risk can be significantly reduced by high-dose statins and positive factors such as the absence of coronary artery lesions, absence of diabetes, and younger age.
开发用于识别高危心肌梗死(MI)患者的预后模型是迈向个性化医疗的关键一步。已知遗传因素与心血管疾病风险增加有关;然而,对于它们是否可用于预测MI患者的主要不良心脏事件(MACE)却知之甚少。本研究旨在构建一种基于机器学习(ML)的模型,根据临床、影像、实验室和遗传特征预测MI患者的MACE,并评估遗传因素对模型预后能力的影响。
我们分析了俄罗斯苏尔古特地区诊断与心血管外科急诊收治的218例MI患者的数据。入院时,为每位患者收集了标准临床测量数据和影像数据。此外,对患者进行了VEGFR - 2变异rs2305948的基因分型(C/C、C/T、T/T基因型,其中T为次要风险等位基因)。该研究包括9年的随访期,在此期间记录主要缺血事件。我们训练并评估了各种ML模型,包括梯度提升、随机森林、逻辑回归和自动机器学习。对于特征重要性分析,我们应用了顺序特征选择(SFS)和夏普利加法解释(SHAP)方法。
采用SFS方法选择特征的CatBoost算法在测试队列中表现最佳,ROC曲线下面积(AUC)达到0.813。特征重要性分析确定他汀类药物剂量是最重要的因素,VEGFR - 2基因型位列前5。其他重要特征包括冠状动脉病变(冠状动脉狭窄≥70%)、左心室(LV)参数如左心室侧壁和左心室质量、糖尿病、血运重建类型(冠状动脉旁路移植术或经皮冠状动脉介入治疗)和年龄。我们还表明,各因素的影响是累加的,高风险可由不同预后因素的累积负面影响来确定。
我们基于ML的方法表明,VEGFR - 2基因型与MI患者发生MACE的风险增加有关。然而,高剂量他汀类药物以及诸如无冠状动脉病变、无糖尿病和年龄较轻等积极因素可显著降低该风险。