Tanaka Marenao, Akiyama Yukinori, Mori Kazuma, Hosaka Itaru, Endo Keisuke, Ogawa Toshifumi, Sato Tatsuya, Suzuki Toru, Yano Toshiyuki, Ohnishi Hirofumi, Hanawa Nagisa, Furuhashi Masato
Department of Cardiovascular, Renal and Metabolic Medicine, Sapporo Medical University School of Medicine, Sapporo, Japan.
Tanaka Medical Clinic, Yoichi, Japan.
Clin Exp Hypertens. 2025 Dec;47(1):2449613. doi: 10.1080/10641963.2025.2449613. Epub 2025 Jan 8.
Sufficient attention has not been given to machine learning (ML) models using longitudinal data for investigating important predictors of new onset of hypertension. We investigated the predictive ability of several ML models for the development of hypertension.
A total of 15 965 Japanese participants (men/women: 9,466/6,499, mean age: 45 years) who received annual health examinations were randomly divided into a training group (70%, = 11,175) and a test group (30%, = 4,790). The predictive abilities of 58 candidates including fatty liver index (FLI), which is calculated by using body mass index, waist circumference and levels of γ-glutamyl transferase and triglycerides, were investigated by statistics analogous to the area under the curve (AUC) in receiver operating characteristic curve analyses using ML models including logistic regression, random forest, naïve Bayes, extreme gradient boosting and artificial neural network.
During a 10-year period (mean period: 6.1 years), 2,132 subjects (19.1%) in the training group and 917 subjects (19.1%) in the test group had new onset of hypertension. Among the 58 parameters, systolic blood pressure, age and FLI were identified as important candidates by random forest feature selection with 10-fold cross-validation. The AUCs of ML models were 0.765-0.825, and discriminatory capacity was significantly improved in the artificial neural network model compared to that in the logistic regression model.
The development of hypertension can be simply and accurately predicted by each ML model using systolic blood pressure, age and FLI as selected features. By building multiple ML models, more practical prediction might be possible.
利用纵向数据研究高血压新发重要预测因素的机器学习(ML)模型尚未得到充分关注。我们研究了几种ML模型对高血压发生的预测能力。
共有15965名接受年度健康检查的日本参与者(男性/女性:9466/6499,平均年龄:45岁)被随机分为训练组(70%,n = 11175)和测试组(30%,n = 4790)。通过使用包括逻辑回归、随机森林、朴素贝叶斯、极端梯度提升和人工神经网络在内的ML模型,在受试者工作特征曲线分析中,采用类似于曲线下面积(AUC)的统计方法,研究了58个候选指标的预测能力,其中包括通过体重指数、腰围以及γ-谷氨酰转移酶和甘油三酯水平计算得出的脂肪肝指数(FLI)。
在10年期间(平均时长:6.1年),训练组中有2132名受试者(19.1%)、测试组中有917名受试者(19.1%)出现了高血压新发情况。在这58个参数中,收缩压、年龄和FLI通过10倍交叉验证的随机森林特征选择被确定为重要候选指标。ML模型的AUC为0.765 - 0.825,与逻辑回归模型相比,人工神经网络模型的鉴别能力有显著提高。
使用收缩压、年龄和FLI作为选定特征的每个ML模型都可以简单准确地预测高血压的发生。通过构建多个ML模型,可能实现更实际的预测。