Suppr超能文献

预测模型在巴斯克地区非计划性住院的开发和验证:分析非确定性算法的可变性。

Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms.

机构信息

Basque Center for Applied Mathematics (BCAM), Bilbao, Spain.

Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS), Barakaldo, Spain.

出版信息

BMC Med Inform Decis Mak. 2023 Aug 5;23(1):152. doi: 10.1186/s12911-023-02226-z.

Abstract

BACKGROUND

The progressive ageing in developed countries entails an increase in multimorbidity. Population-wide predictive models for adverse health outcomes are crucial to address these growing healthcare needs. The main objective of this study is to develop and validate a population-based prognostic model to predict the probability of unplanned hospitalization in the Basque Country, through comparing the performance of a logistic regression model and three families of machine learning models.

METHODS

Using age, sex, diagnoses and drug prescriptions previously transformed by the Johns Hopkins Adjusted Clinical Groups (ACG) System, we predict the probability of unplanned hospitalization in the Basque Country (2.2 million inhabitants) using several techniques. When dealing with non-deterministic algorithms, comparing a single model per technique is not enough to choose the best approach. Thus, we conduct 40 experiments per family of models - Random Forest, Gradient Boosting Decision Trees and Multilayer Perceptrons - and compare them to Logistic Regression. Models' performance are compared both population-wide and for the 20,000 patients with the highest predicted probabilities, as a hypothetical high-risk group to intervene on.

RESULTS

The best-performing technique is Multilayer Perceptron, followed by Gradient Boosting Decision Trees, Logistic Regression and Random Forest. Multilayer Perceptrons also have the lowest variability, around an order of magnitude less than Random Forests. Median area under the ROC curve, average precision and positive predictive value range from 0.789 to 0.802, 0.237 to 0.257 and 0.485 to 0.511, respectively. For Brier Score the median values are 0.048 for all techniques. There is some overlap between the algorithms. For instance, Gradient Boosting Decision Trees perform better than Logistic Regression more than 75% of the time, but not always.

CONCLUSIONS

All models have good global performance. The only family that is consistently superior to Logistic Regression is Multilayer Perceptron, showing a very reliable performance with the lowest variability.

摘要

背景

发达国家人口老龄化导致多病共存的情况日益增加。为了满足这些不断增长的医疗需求,针对不良健康结局的人群预测模型至关重要。本研究的主要目的是开发和验证一种基于人群的预测模型,以预测巴斯克地区计划外住院的概率,方法是比较逻辑回归模型和三种机器学习模型家族的性能。

方法

使用年龄、性别、诊断和药物处方,这些诊断和药物处方之前已经通过约翰霍普金斯调整临床组(ACG)系统进行了转换,我们使用多种技术预测巴斯克地区(220 万居民)的计划外住院概率。在处理非确定性算法时,比较每种技术的单个模型是不够的,无法选择最佳方法。因此,我们对随机森林、梯度提升决策树和多层感知机这三种模型家族进行了 40 次实验,并将其与逻辑回归进行了比较。模型的性能在人群范围内和预测概率最高的 20,000 名患者中进行了比较,作为一个假设的高风险干预群体。

结果

表现最好的技术是多层感知机,其次是梯度提升决策树、逻辑回归和随机森林。多层感知机的变异性也最低,大约比随机森林低一个数量级。ROC 曲线下的中位数、平均精度和阳性预测值的范围分别为 0.789 至 0.802、0.237 至 0.257 和 0.485 至 0.511。对于 Brier 得分,所有技术的中位数均为 0.048。算法之间存在一些重叠。例如,梯度提升决策树的性能优于逻辑回归的情况超过 75%,但并非总是如此。

结论

所有模型的总体性能都很好。唯一始终优于逻辑回归的模型家族是多层感知机,它表现出非常可靠的性能,且变异性最低。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d84f/10403913/a918b967c621/12911_2023_2226_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验