Suppr超能文献

基于机器学习的数据驱动方法预测糖尿病和心血管疾病。

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.

机构信息

Department of Mathematics and Computer Science, Eastern Oregon University, La Grande, OR, USA.

Department of Mathematics and Statistics, Winona State University, Winona, MN, USA.

出版信息

BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.

Abstract

BACKGROUND

Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. We evaluate the capabilities of machine learning models in detecting at-risk patients using survey data (and laboratory results), and identify key variables within the data contributing to these diseases among the patients.

METHODS

Our research explores data-driven approaches which utilize supervised machine learning models to identify patients with such diseases. Using the National Health and Nutrition Examination Survey (NHANES) dataset, we conduct an exhaustive search of all available feature variables within the data to develop models for cardiovascular, prediabetes, and diabetes detection. Using different time-frames and feature sets for the data (based on laboratory data), multiple machine learning models (logistic regression, support vector machines, random forest, and gradient boosting) were evaluated on their classification performance. The models were then combined to develop a weighted ensemble model, capable of leveraging the performance of the disparate models to improve detection accuracy. Information gain of tree-based models was used to identify the key variables within the patient data that contributed to the detection of at-risk patients in each of the diseases classes by the data-learned models.

RESULTS

The developed ensemble model for cardiovascular disease (based on 131 variables) achieved an Area Under - Receiver Operating Characteristics (AU-ROC) score of 83.1% using no laboratory results, and 83.9% accuracy with laboratory results. In diabetes classification (based on 123 variables), eXtreme Gradient Boost (XGBoost) model achieved an AU-ROC score of 86.2% (without laboratory data) and 95.7% (with laboratory data). For pre-diabetic patients, the ensemble model had the top AU-ROC score of 73.7% (without laboratory data), and for laboratory based data XGBoost performed the best at 84.4%. Top five predictors in diabetes patients were 1) waist size, 2) age, 3) self-reported weight, 4) leg length, and 5) sodium intake. For cardiovascular diseases the models identified 1) age, 2) systolic blood pressure, 3) self-reported weight, 4) occurrence of chest pain, and 5) diastolic blood pressure as key contributors.

CONCLUSION

We conclude machine learned models based on survey questionnaire can provide an automated identification mechanism for patients at risk of diabetes and cardiovascular diseases. We also identify key contributors to the prediction, which can be further explored for their implications on electronic health records.

摘要

背景

糖尿病和心血管疾病是美国主要的死亡原因。识别和预测这些疾病对于阻止疾病进展至关重要。我们评估了机器学习模型使用调查数据(和实验室结果)来检测高危患者的能力,并确定了患者数据中导致这些疾病的关键变量。

方法

我们的研究探索了使用数据驱动方法,利用监督机器学习模型来识别患有此类疾病的患者。我们使用国家健康和营养检查调查(NHANES)数据集,对数据中所有可用特征变量进行了全面搜索,以开发心血管疾病、糖尿病前期和糖尿病检测模型。使用基于实验室数据的不同时间范围和特征集,对多种机器学习模型(逻辑回归、支持向量机、随机森林和梯度提升)进行了分类性能评估。然后,我们将这些模型组合起来,开发了一个加权集成模型,能够利用不同模型的性能来提高检测准确性。基于树的模型的信息增益用于识别患者数据中的关键变量,这些变量通过数据学习模型有助于检测每种疾病类别的高危患者。

结果

我们开发了一种基于 131 个变量的心血管疾病集成模型,在不使用实验室结果的情况下,其接收器工作特征曲线下面积(AU-ROC)评分为 83.1%,使用实验室结果的准确率为 83.9%。在糖尿病分类(基于 123 个变量)中,极端梯度提升(XGBoost)模型的 AU-ROC 评分为 86.2%(无实验室数据)和 95.7%(有实验室数据)。对于糖尿病前期患者,集成模型的 AU-ROC 评分为最高的 73.7%(无实验室数据),而基于实验室数据的 XGBoost 的表现最佳,为 84.4%。糖尿病患者的前五名预测因素为:1)腰围大小,2)年龄,3)自我报告体重,4)腿长,5)钠摄入量。对于心血管疾病,模型确定了 1)年龄,2)收缩压,3)自我报告体重,4)胸痛发作,5)舒张压为关键贡献者。

结论

我们得出结论,基于调查问卷的机器学习模型可以为患有糖尿病和心血管疾病风险的患者提供一种自动识别机制。我们还确定了预测的关键贡献者,这些贡献者可以进一步探索它们对电子健康记录的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ec3/6836338/54c63ca0e8d2/12911_2019_918_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验