Oliveira Bruno Alberto Soares, Castro Giulia Zanon, Ferreira Giovanna Luiza Medina, Guimarães Frederico Gadelha
Graduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Avenue Antônio Carlos 6627, Belo Horizonte, 31270-901, Minas Giraes, Brazil.
Faculdade de Medicina, Universidade de Itaúna, Rodovia MG 431 km 45, Itaúna, 35680-142, Minas Giraes, Brazil.
Med Biol Eng Comput. 2023 Jun;61(6):1409-1425. doi: 10.1007/s11517-022-02757-z. Epub 2023 Jan 31.
Cardiovascular diseases are among the leading causes of mortality worldwide, with more than 23 million related deaths per year by 2030, according to the World Heart Federation. Although most of these diseases may be prevented, population awareness strategies are still ineffective. In this context, we propose the CML-Cardio tool, a machine learning application to automate the risk classification process of developing CVDs. For this, researchers in our group collected data on diabetes, blood pressure, and other risk factors in a private company. Our final model consists of a cascade system to handle highly imbalanced data. In the first stage, a binary model is responsible for predicting whether a patient has a low risk of developing CVDs or if has a risk that needs attention. In this step, we use six algorithms: logistic regression, SVM, random forest, XGBoost, CatBoost, and multilayer perceptron. The better results presented an average accuracy of 0.86 ± 0.03 and f-score of 0.85 ± 0.04. We interpret each feature's impact on the models' output and validate the subsystem for the next step. In the second stage, we use an anomaly detection model to learn the intermediate risk patterns present in the instances that need attention. The cascade model presented an average accuracy of 0.80 ± 0.07 and f-score of 0.70 ± 0.07. Finally, we develop the CML-Cardio prototype of an actual application as a primary prevention strategy. Graphical abstract In this work, we propose the CML-Cardio tool, a cascade machine learning method to classify cardiovascular disease risk.
心血管疾病是全球主要死因之一,据世界心脏联盟称,到2030年,每年有超过2300万人死于心血管疾病。尽管这些疾病大多可以预防,但提高公众认知的策略仍然效果不佳。在此背景下,我们提出了CML-Cardio工具,这是一种机器学习应用程序,用于自动执行心血管疾病风险分类过程。为此,我们团队的研究人员收集了一家私人公司中有关糖尿病、血压和其他风险因素的数据。我们的最终模型由一个级联系统组成,用于处理高度不平衡的数据。在第一阶段,一个二元模型负责预测患者患心血管疾病的风险是低还是需要关注。在这一步中,我们使用了六种算法:逻辑回归、支持向量机、随机森林、XGBoost、CatBoost和多层感知器。较好的结果呈现出平均准确率为0.86±0.03,F值为0.85±0.04。我们解释了每个特征对模型输出的影响,并验证了用于下一步的子系统。在第二阶段,我们使用异常检测模型来学习需要关注的实例中存在的中等风险模式。级联模型的平均准确率为0.80±0.07,F值为0.70±0.07。最后,我们开发了CML-Cardio实际应用的原型,作为一级预防策略。图形摘要 在这项工作中,我们提出了CML-Cardio工具,这是一种用于对心血管疾病风险进行分类的级联机器学习方法。