Peng Mengxiao, Hou Fan, Cheng Zhixiang, Shen Tongtong, Liu Kaixian, Zhao Cai, Zheng Wen
Institute of Public-Safety and Big Data, College of Data Science, Taiyuan University of Technology, University Street, Yuci District, Jinzhong, 030600, China.
Center for Big Data Research in Health, Changzhi Medical College, East Jiefang Street, Changzhi, 046000, China.
Sci Rep. 2023 Mar 23;13(1):4778. doi: 10.1038/s41598-023-31870-8.
The risk of cardiovascular disease (CVD) is a serious health threat to human society worldwide. The use of machine learning methods to predict the risk of CVD is of great relevance to identify high-risk patients and take timely interventions. In this study, we propose the XGBH machine learning model, which is a CVD risk prediction model based on key contributing features. In this paper, the generalisation of the model was enhanced by adding retrospective data of 14,832 Chinese Shanxi CVD patients to the kaggle dataset. The XGBH risk prediction model proposed in this paper was validated to be highly accurate (AUC = 0.81) compared to the baseline risk score (AUC = 0.65), and the accuracy of the model for CVD risk prediction was improved with the inclusion of the conventional biometric BMI variable. To increase the clinical application of the model, a simpler diagnostic model was designed in this paper, which requires only three characteristics from the patient (age, value of systolic blood pressure and whether cholesterol is normal or not) to enable early intervention in the treatment of high-risk patients with a slight reduction in accuracy (AUC = 0.79). Ultimately, a CVD risk score model with few features and high accuracy will be established based on the main contributing features. Of course, further prospective studies, as well as studies with other populations, are needed to assess the actual clinical effectiveness of the XGBH risk prediction model.
心血管疾病(CVD)风险是对全球人类社会的严重健康威胁。使用机器学习方法预测CVD风险对于识别高危患者并及时进行干预具有重要意义。在本研究中,我们提出了XGBH机器学习模型,这是一种基于关键促成特征的CVD风险预测模型。在本文中,通过将14832名中国山西CVD患者的回顾性数据添加到kaggle数据集中,增强了模型的泛化能力。与基线风险评分(AUC = 0.65)相比,本文提出的XGBH风险预测模型经验证具有很高的准确性(AUC = 0.81),并且通过纳入传统生物特征BMI变量提高了模型对CVD风险预测的准确性。为了增加模型的临床应用,本文设计了一个更简单的诊断模型,该模型仅需要患者的三个特征(年龄、收缩压值以及胆固醇是否正常),就能在准确性略有降低(AUC = 0.79)的情况下对高危患者进行早期干预治疗。最终,将基于主要促成特征建立一个特征少且准确性高的CVD风险评分模型。当然,还需要进一步的前瞻性研究以及针对其他人群的研究,以评估XGBH风险预测模型的实际临床效果。