Suppr超能文献

[用于健康预测分析的机器学习:以巴西圣保罗老年人死亡预测应用为例]

[Machine learning for predictive analyses in health: an example of an application to predict death in the elderly in São Paulo, Brazil].

作者信息

Santos Hellen Geremias Dos, Nascimento Carla Ferreira do, Izbicki Rafael, Duarte Yeda Aparecida de Oliveira, Porto Chiavegatto Filho Alexandre Dias

机构信息

Faculdade de Saúde Pública, Universidade de São Paulo, São Paulo, Brasil.

Centro de Ciências Exatas e de Tecnologia, Universidade Federal de São Carlos, São Carlos, Brasil.

出版信息

Cad Saude Publica. 2019 Jul 29;35(7):e00050818. doi: 10.1590/0102-311X00050818.

Abstract

This study aims to present the stages related to the use of machine learning algorithms for predictive analyses in health. An application was performed in a database of elderly residents in the city of São Paulo, Brazil, who participated in the Health, Well-Being, and Aging Study (SABE) (n = 2,808). The outcome variable was the occurrence of death within five years of the elder's entry into the study (n = 423), and the predictors were 37 variables related to the elder's demographic, socioeconomic, and health profile. The application was organized according to the following stages: division of data in training (70%) and testing (30%), pre-processing of the predictors, learning, and assessment of the models. The learning stage used 5 algorithms to adjust the models: logistic regression with and without penalization, neural networks, gradient boosted trees, and random forest. The algorithms' hyperparameters were optimized by 10-fold cross-validation to select those corresponding to the best models. For each algorithm, the best model was assessed in test data via area under the ROC curve (AUC) and related measures. All the models presented AUC ROC greater than 0.70. For the three models with the highest AUC ROC (neural networks and logistic regression with LASSO penalization and without penalization, respectively), quality measures of the predicted probability were also assessed. The expectation is that with the increased availability of data and trained human capital, it will be possible to develop predictive machine learning models with the potential to help health professionals make the best decisions.

摘要

本研究旨在介绍与使用机器学习算法进行健康预测分析相关的阶段。在巴西圣保罗市老年居民的数据库中进行了一项应用,这些居民参与了健康、幸福与衰老研究(SABE)(n = 2808)。结果变量是老年人进入研究后五年内的死亡发生情况(n = 423),预测变量是与老年人的人口统计学、社会经济和健康状况相关的37个变量。该应用按照以下阶段进行组织:将数据划分为训练集(70%)和测试集(30%)、预测变量的预处理、学习以及模型评估。学习阶段使用了5种算法来调整模型:有惩罚和无惩罚的逻辑回归、神经网络、梯度提升树和随机森林。通过10折交叉验证对算法的超参数进行优化,以选择对应最佳模型的参数。对于每种算法,通过ROC曲线下面积(AUC)及相关指标在测试数据中评估最佳模型。所有模型的AUC ROC均大于0.70。对于AUC ROC最高的三个模型(分别为神经网络以及有LASSO惩罚和无惩罚的逻辑回归),还评估了预测概率的质量指标。预期随着数据可用性和受过训练的人力资本的增加,将有可能开发出具有潜力帮助健康专业人员做出最佳决策的预测性机器学习模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验