School of Public Health, University of São Paulo, São Paulo, SP, Brazil.
Brazilian Institute of Education, Development and Research-IDP, Economics Graduate Program, Brasilia, DF, Brazil.
Sci Rep. 2023 Jan 19;13(1):1022. doi: 10.1038/s41598-022-26467-6.
Machine learning algorithms are being increasingly used in healthcare settings but their generalizability between different regions is still unknown. This study aims to identify the strategy that maximizes the predictive performance of identifying the risk of death by COVID-19 in different regions of a large and unequal country. This is a multicenter cohort study with data collected from patients with a positive RT-PCR test for COVID-19 from March to August 2020 (n = 8477) in 18 hospitals, covering all five Brazilian regions. Of all patients with a positive RT-PCR test during the period, 2356 (28%) died. Eight different strategies were used for training and evaluating the performance of three popular machine learning algorithms (extreme gradient boosting, lightGBM, and catboost). The strategies ranged from only using training data from a single hospital, up to aggregating patients by their geographic regions. The predictive performance of the algorithms was evaluated by the area under the ROC curve (AUROC) on the test set of each hospital. We found that the best overall predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals. In this study, the use of more patient data from other regions slightly decreased predictive performance. However, models trained in other hospitals still had acceptable performances and could be a solution while data for a specific hospital is being collected.
机器学习算法在医疗保健领域的应用越来越广泛,但它们在不同地区的泛化能力仍不清楚。本研究旨在确定一种策略,该策略可以最大限度地提高识别 COVID-19 死亡风险的预测性能,研究对象为来自巴西 18 家医院的 COVID-19 阳性 RT-PCR 检测患者,数据收集时间为 2020 年 3 月至 8 月(n=8477),涵盖了巴西的所有五个地区。在此期间所有 COVID-19 阳性 RT-PCR 检测患者中,有 2356 人(28%)死亡。本研究使用了 8 种不同的策略来训练和评估 3 种流行的机器学习算法(极端梯度提升、lightGBM 和 catboost)的性能。这些策略从仅使用单个医院的训练数据到按地理位置聚合患者不等。算法的预测性能通过各医院测试集的 ROC 曲线下面积(AUROC)来评估。研究发现,使用来自同一医院的训练数据可获得最佳的整体预测性能,在 18 家参与医院中,有 11 家(61%)医院采用的是这种策略。在本研究中,使用来自其他地区的更多患者数据会略微降低预测性能。然而,在其他医院训练的模型仍具有可接受的性能,在特定医院的数据收集期间,可以作为解决方案。