Bai Xue, Liu Wenjun, Huang Hui, You Huan
School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China.
Department of Ultrasound, Affiliated Hospital of Nanjing University of CM, Nanjing 210029, China.
Iran J Public Health. 2022 Sep;51(9):2099-2107. doi: 10.18502/ijph.v51i9.10565.
Hypertension is the main reason why the incidence of cardiovascular disease has increased year-by-year and early diagnosis of hypertension is necessary to reducing the incidence of cardiovascular disease. This also puts forward higher requirements for the accuracy of diagnosis. We tried a variety of feature selection methods to improve the accuracy of logistic regression (LR).
We collected 397 samples from Nanjing, Jiangsu, China between Jan 2016 and Dec 2017, including 178 hypertension samples and 219 control samples. It includes not only clinical and laboratory data, but also imaging data. We focused on the difference of imaging attributes between the control group and the hypertension group, and analyzed the correlation coefficients of all attributes. In order to establish the optimal LR model, this study tried three different feature selection methods, including statistical analysis, random forest (RF) and extreme gradient boosting (XGBoost). The area under the ROC curve (AUC) and accuracy were used as the main criterion for model evaluation.
In the prediction of hypertension, the performance of LR with RF as the feature selection method (accuracy: 0.910; AUC: 0.924) was better than the performance of LR with XGBoost as the feature selection method (accuracy: 0.897; AUC: 0.915) and the performance of LR with statistical analysis as the feature selection method (accuracy: 0.872; AUC: 0.926).
LR with RF as the feature selection method may provide accurate results in predicting hypertension. Carotid intima-media thickness (cIMT) and pulse wave velocity at the end of systole (ESPWV) are two key imaging indicators in the prediction of hypertension.
高血压是心血管疾病发病率逐年上升的主要原因,早期诊断高血压对于降低心血管疾病发病率至关重要。这也对诊断准确性提出了更高要求。我们尝试了多种特征选择方法以提高逻辑回归(LR)的准确性。
我们于2016年1月至2017年12月期间从中国江苏南京收集了397份样本,包括178份高血压样本和219份对照样本。其不仅包括临床和实验室数据,还包括影像数据。我们重点关注对照组和高血压组之间影像属性的差异,并分析了所有属性的相关系数。为建立最佳LR模型,本研究尝试了三种不同的特征选择方法,包括统计分析、随机森林(RF)和极端梯度提升(XGBoost)。ROC曲线下面积(AUC)和准确性被用作模型评估的主要标准。
在高血压预测中,以RF作为特征选择方法的LR性能(准确性:0.910;AUC:0.924)优于以XGBoost作为特征选择方法的LR性能(准确性:0.897;AUC:0.915)以及以统计分析作为特征选择方法的LR性能(准确性:0.872;AUC:0.926)。
以RF作为特征选择方法的LR在预测高血压方面可能提供准确结果。颈动脉内膜中层厚度(cIMT)和收缩期末脉搏波速度(ESPWV)是预测高血压的两个关键影像指标。