Suppr超能文献

使用健身数据比较机器学习技术预测全因死亡率:亨利福特锻炼测试(FIT)项目。

Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.

机构信息

King AbdulAziz Cardiac Center, Ministry of National Guard, Health Affairs, King Abdulaziz Medical City for National Guard - Health affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Department Mail Code: 1413, P.O. Box 22490, Riyadh, 11426, Kingdom of Saudi Arabia.

Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

出版信息

BMC Med Inform Decis Mak. 2017 Dec 19;17(1):174. doi: 10.1186/s12911-017-0566-6.

Abstract

BACKGROUND

Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality).

METHODS

We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used.

RESULTS

Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling.

CONCLUSIONS

The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness data.

摘要

背景

先前的研究表明,心肺适能(CRF)是心血管健康的强有力指标。机器学习(ML)可以通过分类技术增强对结果的预测,这些技术将数据分类到预定的类别中。本研究旨在展示如何在心肺适能的医疗记录上应用机器学习技术,并比较各种技术在预测医疗结果(例如死亡率)方面的能力差异。

方法

我们使用了 1991 年至 2009 年间在亨利福特健康系统接受临床医生推荐的运动跑步机压力测试且在 10 年内完成完整随访的 34212 例无已知冠状动脉疾病或心力衰竭的患者的数据。评估了七种机器学习分类技术:决策树(DT)、支持向量机(SVM)、人工神经网络(ANN)、朴素贝叶斯分类器(BC)、贝叶斯网络(BN)、K-近邻(KNN)和随机森林(RF)。为了处理使用的不平衡数据集,使用了合成少数过采样技术(SMOTE)。

结果

在使用和不使用 SMOTE 采样技术的情况下进行了两组实验。在不同的评估指标上,SVM 分类器的平均性能最低,而其他模型,如 BN、BC 和 DT 的性能更好。在使用 SMOTE 采样训练的所有模型中,RF 分类器的表现最好(AUC=0.97)。

结论

结果表明,各种 ML 技术在不同的评估指标上的性能可能会有很大差异。也不一定是 ML 模型越复杂,预测准确性就越高。使用 SMOTE 训练的所有模型的预测性能都明显优于未使用 SMOTE 训练的模型的性能。该研究表明,机器学习方法在使用心肺适能数据预测全因死亡率方面具有潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c601/5735871/fba67734c6b8/12911_2017_566_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验