Guangzhou Institute of Cardiovascular Disease, Guangdong Key Laboratory of Vascular Diseases, State Key Laboratory of Respiratory Disease, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China.
Department of Laboratory Medicine, Panyu Hospital of Chinese Medicine, Guangzhou University of Chinese Medicine, 511400 Guangzhou, Guangdong, China.
Front Biosci (Landmark Ed). 2022 Jul 4;27(7):211. doi: 10.31083/j.fbl2707211.
Premature coronary artery disease (PCAD) has a poor prognosis and a high mortality and disability rate. Accurate prediction of the risk of PCAD is very important for the prevention and early diagnosis of this disease. Machine learning (ML) has been proven a reliable method used for disease diagnosis and for building risk prediction models based on complex factors. The aim of the present study was to develop an accurate prediction model of PCAD risk that allows early intervention.
We performed retrospective analysis of single nucleotide polymorphisms (SNPs) and traditional cardiovascular risk factors (TCRFs) for 131 PCAD patients and 187 controls. The data was used to construct classifiers for the prediction of PCAD risk with the machine learning (ML) algorithms LogisticRegression (LRC), RandomForestClassifier (RFC) and GradientBoostingClassifier (GBC) in scikit-learn. Three quarters of the participants were randomly grouped into a training dataset and the rest into a test dataset. The performance of classifiers was evaluated using area under the receiver operating characteristic curve (AUC), sensitivity and concordance index. R packages were used to construct nomograms.
Three optimized feature combinations (FCs) were identified: RS-DT-FC1 (rs2259816, rs1378577, rs10757274, rs4961, smoking, hyperlipidemia, glucose, triglycerides), RS-DT-FC2 (rs1378577, rs10757274, smoking, diabetes, hyperlipidemia, glucose, triglycerides) and RS-DT-FC3 (rs1169313, rs5082, rs9340799, rs10757274, rs1152002, smoking, hyperlipidemia, high-density lipoprotein cholesterol). These were able to build the classifiers with an AUC >0.90 and sensitivity >0.90. The nomograms built with RS-DT-FC1, RS-DT-FC2 and RS-DT-FC3 had a concordance index of 0.94, 0.94 and 0.90, respectively, when validated with the test dataset, and 0.79, 0.82 and 0.79 when validated with the training dataset. Manual prediction of the test data with the three nomograms resulted in an AUC of 0.89, 0.92 and 0.83, respectively, and a sensitivity of 0.92, 0.96 and 0.86, respectively.
The selection of suitable features determines the performance of ML models. RS-DT-FC2 may be a suitable FC for building a high-performance prediction model of PCAD with good sensitivity and accuracy. The nomograms allow practical scoring and interpretation of each predictor and may be useful for clinicians in determining the risk of PCAD.
早发性冠状动脉疾病(PCAD)预后不良,死亡率和残疾率高。准确预测 PCAD 的风险对于预防和早期诊断这种疾病非常重要。机器学习(ML)已被证明是一种可靠的方法,可用于疾病诊断和基于复杂因素构建风险预测模型。本研究的目的是开发一种准确的 PCAD 风险预测模型,以便进行早期干预。
我们对 131 例 PCAD 患者和 187 例对照者的单核苷酸多态性(SNP)和传统心血管危险因素(TCRF)进行回顾性分析。使用机器学习(ML)算法 LogisticRegression(LRC)、RandomForestClassifier(RFC)和 GradientBoostingClassifier(GBC)在 scikit-learn 中构建用于预测 PCAD 风险的分类器。将参与者的四分之三随机分为训练数据集,其余部分分为测试数据集。使用接收器工作特征曲线(ROC)下面积(AUC)、敏感性和一致性指数评估分类器的性能。使用 R 包构建列线图。
确定了三个优化特征组合(FC):RS-DT-FC1(rs2259816、rs1378577、rs10757274、rs4961、吸烟、高脂血症、葡萄糖、甘油三酯)、RS-DT-FC2(rs1378577、rs10757274、吸烟、糖尿病、高脂血症、葡萄糖、甘油三酯)和 RS-DT-FC3(rs1169313、rs5082、rs9340799、rs10757274、rs1152002、吸烟、高脂血症、高密度脂蛋白胆固醇)。这些能够构建 AUC>0.90 和敏感性>0.90 的分类器。RS-DT-FC1、RS-DT-FC2 和 RS-DT-FC3 构建的列线图在测试数据集验证时的一致性指数分别为 0.94、0.94 和 0.90,在训练数据集验证时的一致性指数分别为 0.79、0.82 和 0.79。使用三个列线图对测试数据进行手动预测,AUC 分别为 0.89、0.92 和 0.83,敏感性分别为 0.92、0.96 和 0.86。
合适特征的选择决定了 ML 模型的性能。RS-DT-FC2 可能是构建具有良好敏感性和准确性的 PCAD 高性能预测模型的合适 FC。列线图允许对每个预测因子进行实际评分和解释,对于临床医生确定 PCAD 的风险可能很有用。