Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.
Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan.
PLoS One. 2024 Nov 13;19(11):e0313658. doi: 10.1371/journal.pone.0313658. eCollection 2024.
This paper proposes the use of machine learning models to predict one's risk of having hypertension in the future using their routine health checkup data of their current and past visits to a health checkup center. The large-scale and high-dimensional dataset used in this study comes from MJ Health Research Foundation in Taiwan. The training data for models is separated into 5 folds and used to train 5 models in a 5-fold cross validation manner. While predicting the results for the test set, the voted result of 5 models is used as the final prediction. Experimental results show that our models achieve 69.59% of precision, 77.90% of recall, and 73.51% of F1-score, which outperforms a baseline using only the blood pressure of visitors' last visits. Experiments also show that a visitor who performs a health checkup more often can be predicted better, and models trained with selected important factors achieve better results than those trained with Framingham risk score. We also demonstrate the possibility of using our models to suggest visitors for weight control by adding virtual visits that assume their body weight can be reduced in the near future to model input. Experimental results show that around 5.48% of the people who are with high Body Mass Index of the true positive cases are rejudged as negative, and a rising trend appears when adding more virtual visits, which may be used to suggest visitors that controlling their body weight for a longer time lead to lower probability of having hypertension in the future.
本文提出利用机器学习模型,使用人们在健康检查中心当前和过去的常规健康检查数据,预测他们未来患高血压的风险。本研究使用的大规模高维数据集来自台湾的美兆健康研究基金会。模型的训练数据分为 5 折,并使用 5 折交叉验证方式训练 5 个模型。在预测测试集的结果时,使用 5 个模型的投票结果作为最终预测。实验结果表明,我们的模型达到了 69.59%的精度、77.90%的召回率和 73.51%的 F1 分数,优于仅使用访客最后一次就诊血压的基线模型。实验还表明,进行健康检查更频繁的访客可以得到更好的预测,并且使用选定的重要因素训练的模型比使用 Framingham 风险评分训练的模型取得更好的结果。我们还通过将假设他们的体重在不久的将来可以减轻的虚拟就诊添加到模型输入中,展示了使用我们的模型建议访客进行体重控制的可能性。实验结果表明,在真正的阳性病例中,大约有 5.48%的高身体质量指数的人被重新判断为阴性,并且随着添加更多的虚拟就诊,出现了上升趋势,这可能用于建议访客控制体重的时间更长,未来患高血压的概率更低。