King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia.
King Abdullah International Medical Research Center, Riyadh, Saudia Arabia.
PLoS One. 2018 Apr 18;13(4):e0195344. doi: 10.1371/journal.pone.0195344. eCollection 2018.
This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.
本研究评估和比较了不同机器学习技术在使用心肺适能数据预测发生高血压风险的个体和最有可能从干预中受益的个体方面的表现。本研究的数据集包含了 1991 年至 2009 年间在亨利福特健康系统接受临床医生推荐的跑步机压力测试的 23095 名患者的信息,并且有完整的 10 年随访。数据集的变量包括生命体征、诊断和临床实验室测量信息。研究考察了六种机器学习技术:LogitBoost(LB)、贝叶斯网络分类器(BN)、局部加权朴素贝叶斯(LWB)、人工神经网络(ANN)、支持向量机(SVM)和随机树森林(RTF)。使用不同的验证方法,RTF 模型表现出最佳性能(AUC=0.93),优于本研究中检查的所有其他机器学习技术。研究结果还表明,使用各种模型评估方法仔细探索和评估机器学习模型的性能至关重要,因为预测准确性可能会有很大差异。