Institute of Public Health, School of Medicine, National Yang-Ming University, Taipei 112, Taiwan.
Institute of Public Health, School of Medicine, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.
Int J Environ Res Public Health. 2021 May 6;18(9):4943. doi: 10.3390/ijerph18094943.
: Early detection of heart failure is the basis for better medical treatment and prognosis. Over the last decades, both prevalence and incidence rates of heart failure have increased worldwide, resulting in a significant global public health issue. However, an early diagnosis is not an easy task because symptoms of heart failure are usually non-specific. Therefore, this study aims to develop a risk prediction model for incident heart failure through a machine learning-based predictive model. Although African Americans have a higher risk of incident heart failure among all populations, few studies have developed a heart failure risk prediction model for African Americans. : This research implemented the Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression, support vector machine, random forest, and Extreme Gradient Boosting (XGBoost) to establish the Jackson Heart Study's predictive model. In the analysis of real data, missing data are problematic when building a predictive model. Here, we evaluate predictors' inclusion with various missing rates and different missing imputation strategies to discover the optimal analytics. : According to hundreds of models that we examined, the best predictive model was the XGBoost that included variables with a missing rate of less than 30 percent, and we imputed missing values by non-parametric random forest imputation. The optimal XGBoost machine demonstrated an Area Under Curve (AUC) of 0.8409 to predict heart failure for the Jackson Heart Study. : This research identifies variations of diabetes medication as the most crucial risk factor for heart failure compared to the complete cases approach that failed to discover this phenomenon.
早期发现心力衰竭是改善治疗效果和预后的基础。在过去几十年中,心力衰竭的患病率和发病率在全球范围内都有所增加,这是一个重大的全球公共卫生问题。然而,早期诊断并非易事,因为心力衰竭的症状通常不具有特异性。因此,本研究旨在通过基于机器学习的预测模型开发心力衰竭的发病风险预测模型。尽管非裔美国人在所有人群中发生心力衰竭的风险更高,但很少有研究为非裔美国人开发心力衰竭风险预测模型。
这项研究实施了最小绝对收缩和选择算子(LASSO)逻辑回归、支持向量机、随机森林和极端梯度提升(XGBoost),以建立杰克逊心脏研究的预测模型。在真实数据分析中,构建预测模型时会出现缺失数据的问题。在这里,我们评估了各种缺失率和不同缺失插补策略下的预测因子纳入情况,以发现最佳分析方法。
根据我们检查的数百个模型,最佳预测模型是 XGBoost,它包含缺失率低于 30%的变量,并且我们通过非参数随机森林插补来插补缺失值。最优的 XGBoost 机器在预测杰克逊心脏研究中心力衰竭方面的曲线下面积(AUC)为 0.8409。
这项研究确定了糖尿病药物的变化是非裔美国人发生心力衰竭的最重要风险因素,而完整病例方法未能发现这一现象。