Suppr超能文献

基于大型巴西医院内数据集预测 COVID-19 死亡率的机器元学习(集成)方法的潜力和局限性。

Potential and limitations of machine meta-learning (ensemble) methods for predicting COVID-19 mortality in a large inhospital Brazilian dataset.

机构信息

Computer Science Department, Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, Belo Horizonte, Brazil.

Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, Belo Horizonte, Brazil.

出版信息

Sci Rep. 2023 Mar 1;13(1):3463. doi: 10.1038/s41598-023-28579-z.

Abstract

The majority of early prediction scores and methods to predict COVID-19 mortality are bound by methodological flaws and technological limitations (e.g., the use of a single prediction model). Our aim is to provide a thorough comparative study that tackles those methodological issues, considering multiple techniques to build mortality prediction models, including modern machine learning (neural) algorithms and traditional statistical techniques, as well as meta-learning (ensemble) approaches. This study used a dataset from a multicenter cohort of 10,897 adult Brazilian COVID-19 patients, admitted from March/2020 to November/2021, including patients [median age 60 (interquartile range 48-71), 46% women]. We also proposed new original population-based meta-features that have not been devised in the literature. Stacking has shown to achieve the best results reported in the literature for the death prediction task, improving over previous state-of-the-art by more than 46% in Recall for predicting death, with AUROC 0.826 and MacroF1 of 65.4%. The newly proposed meta-features were highly discriminative of death, but fell short in producing large improvements in final prediction performance, demonstrating that we are possibly on the limits of the prediction capabilities that can be achieved with the current set of ML techniques and (meta-)features. Finally, we investigated how the trained models perform on different hospitals, showing that there are indeed large differences in classifier performance between different hospitals, further making the case that errors are produced by factors that cannot be modeled with the current predictors.

摘要

大多数用于预测 COVID-19 死亡率的早期预测评分和方法都存在方法学缺陷和技术限制(例如,使用单一预测模型)。我们的目的是提供一项全面的比较研究,以解决这些方法学问题,考虑使用多种技术来构建死亡率预测模型,包括现代机器学习(神经网络)算法和传统统计技术,以及元学习(集成)方法。本研究使用了来自巴西多中心队列的 10897 名成年 COVID-19 患者的数据集,这些患者于 2020 年 3 月至 2021 年 11 月入院,包括患者[中位数年龄 60 岁(四分位距 48-71),46%为女性]。我们还提出了新的原始基于人群的元特征,这些特征在文献中尚未提出。堆叠在死亡预测任务中取得了文献中报道的最佳结果,在预测死亡的召回率方面比以前的最先进方法提高了 46%以上,AUROC 为 0.826,MacroF1 为 65.4%。新提出的元特征对死亡具有高度的区分能力,但在最终预测性能方面没有产生较大的改进,这表明我们可能已经达到了当前 ML 技术和(元)特征所能达到的预测能力的极限。最后,我们研究了训练后的模型在不同医院的表现,结果表明不同医院的分类器性能确实存在很大差异,这进一步证明错误是由当前预测器无法建模的因素造成的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验