多中心随机森林模型在协作临床研究网络中的有效预后预测。

A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.

机构信息

Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.

Department of Surgical Oncology, Second affiliated hospital, Zhejiang University School of Medicine, Hangzhou, China.

出版信息

Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.

DOI:10.1016/j.artmed.2020.101814

PMID:32143809

Abstract

BACKGROUND

The accuracy of a prognostic prediction model has become an essential aspect of the quality and reliability of the health-related decisions made by clinicians in modern medicine. Unfortunately, individual institutions often lack sufficient samples, which might not provide sufficient statistical power for models. One mitigation is to expand data collection from a single institution to multiple centers to collectively increase the sample size. However, sharing sensitive biomedical data for research involves complicated issues. Machine learning models such as random forests (RF), though they are commonly used and achieve good performances for prognostic prediction, usually suffer worse performance under multicenter privacy-preserving data mining scenarios compared to a centrally trained version.

METHODS AND MATERIALS

In this study, a multicenter random forest prognosis prediction model is proposed that enables federated clinical data mining from horizontally partitioned datasets. By using a novel data enhancement approach based on a differentially private generative adversarial network customized to clinical prognosis data, the proposed model is able to provide a multicenter RF model with performances on par with-or even better than-centrally trained RF but without the need to aggregate the raw data. Moreover, our model also incorporates an importance ranking step designed for feature selection without sharing patient-level information.

RESULT

The proposed model was evaluated on colorectal cancer datasets from the US and China. Two groups of datasets with different levels of heterogeneity within the collaborative research network were selected. First, we compare the performance of the distributed random forest model under different privacy parameters with different percentages of enhancement datasets and validate the effectiveness and plausibility of our approach. Then, we compare the discrimination and calibration ability of the proposed multicenter random forest with a centrally trained random forest model and other tree-based classifiers as well as some commonly used machine learning methods. The results show that the proposed model can provide better prediction performance in terms of discrimination and calibration ability than the centrally trained RF model or the other candidate models while following the privacy-preserving rules in both groups. Additionally, good discrimination and calibration ability are shown on the simplified model based on the feature importance ranking in the proposed approach.

CONCLUSION

The proposed random forest model exhibits ideal prediction capability using multicenter clinical data and overcomes the performance limitation arising from privacy guarantees. It can also provide feature importance ranking across institutions without pooling the data at a central site. This study offers a practical solution for building a prognosis prediction model in the collaborative clinical research network and solves practical issues in real-world applications of medical artificial intelligence.

摘要

背景

预测模型的准确性已成为现代医学中临床医生所做的与健康相关决策的质量和可靠性的重要方面。不幸的是，个别机构通常缺乏足够的样本，这可能无法为模型提供足够的统计能力。一种缓解方法是将数据收集从单个机构扩展到多个中心，以共同增加样本量。然而，共享用于研究的敏感生物医学数据涉及复杂的问题。随机森林（RF）等机器学习模型虽然常用于预后预测，并且表现良好，但在多中心隐私保护数据挖掘场景下的性能通常比集中训练版本差。

方法和材料

本研究提出了一种多中心随机森林预后预测模型，该模型允许从水平分割的数据集中进行联合临床数据挖掘。通过使用一种基于针对临床预后数据定制的差分隐私生成对抗网络的新颖数据增强方法，所提出的模型能够为多中心 RF 模型提供与集中训练的 RF 模型相当甚至更好的性能，而无需聚合原始数据。此外，我们的模型还结合了一个重要性排名步骤，用于在不共享患者级信息的情况下进行特征选择。

结果

在所提出的模型中，对来自美国和中国的结直肠癌数据集进行了评估。选择了两组具有不同协作研究网络内异质性水平的数据集。首先，我们比较了在不同隐私参数下，不同增强数据集比例下分布式随机森林模型的性能，并验证了我们方法的有效性和合理性。然后，我们将所提出的多中心随机森林与集中训练的随机森林模型以及其他基于树的分类器以及一些常用的机器学习方法进行了比较。结果表明，在所提出的模型中，与集中训练的 RF 模型或其他候选模型相比，所提出的模型可以在两组数据中都遵循隐私保护规则的情况下，提供更好的预测性能，并且具有更好的判别和校准能力。此外，在所提出的方法中基于特征重要性排名的简化模型上也显示出了良好的判别和校准能力。

结论

所提出的随机森林模型使用多中心临床数据展示了理想的预测能力，并克服了隐私保护带来的性能限制。它还可以提供跨机构的特征重要性排名，而无需在中央站点汇集数据。本研究为构建协作临床研究网络中的预后预测模型提供了一种实用的解决方案，并解决了医学人工智能实际应用中的实际问题。

相似文献

A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.

Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.

Improving prediction for medical institution with limited patient data: Leveraging hospital-specific data based on multicenter collaborative research network.

Artif Intell Med. 2021 Mar;113:102024. doi: 10.1016/j.artmed.2021.102024. Epub 2021 Jan 23.

POPCORN: A web service for individual PrognOsis prediction based on multi-center clinical data CollabORatioN without patient-level data sharing.

J Biomed Inform. 2018 Oct;86:1-14. doi: 10.1016/j.jbi.2018.08.008. Epub 2018 Aug 10.

Establishment and evaluation of a multicenter collaborative prediction model construction framework supporting model generalization and continuous improvement: A pilot study.

Int J Med Inform. 2020 Sep;141:104173. doi: 10.1016/j.ijmedinf.2020.104173. Epub 2020 May 30.

Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.

Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.

Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.

EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.

PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data.

IEEE/ACM Trans Comput Biol Bioinform. 2024 Jan-Feb;21(1):1-13. doi: 10.1109/TCBB.2023.3286274. Epub 2024 Feb 5.

A comparative study of forest methods for time-to-event data: variable selection and predictive performance.

BMC Med Res Methodol. 2021 Sep 25;21(1):193. doi: 10.1186/s12874-021-01386-8.

A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models.

BMC Med Inform Decis Mak. 2020 Feb 11;20(1):29. doi: 10.1186/s12911-020-1041-3.

Privacy-preserving model learning on a blockchain network-of-networks.

J Am Med Inform Assoc. 2020 Mar 1;27(3):343-354. doi: 10.1093/jamia/ocz214.

引用本文的文献

Interpretable web-based machine learning model for predicting intravenous immunoglobulin resistance in Kawasaki disease.

Ital J Pediatr. 2025 Jun 9;51(1):181. doi: 10.1186/s13052-025-02036-1.

Survival path model outperforms conventional static machine learning models in long-term dynamic prognosis prediction for patients with intermediate stage hepatocellular carcinoma.

Bioinform Adv. 2025 Feb 17;5(1):vbaf027. doi: 10.1093/bioadv/vbaf027. eCollection 2025.

Recent advances in the bench-to-bedside translation of cancer nanomedicines.

Acta Pharm Sin B. 2025 Jan;15(1):97-122. doi: 10.1016/j.apsb.2024.12.007. Epub 2024 Dec 14.

Machine learning-random forest model was used to construct gene signature associated with cuproptosis to predict the prognosis of gastric cancer.

Sci Rep. 2025 Feb 4;15(1):4170. doi: 10.1038/s41598-025-88812-9.

Dysregulated autoantibodies targeting AGTR1 are associated with the accumulation of COVID-19 symptoms.

NPJ Syst Biol Appl. 2025 Jan 13;11(1):7. doi: 10.1038/s41540-025-00488-z.

Machine learning-based prediction of 5-year survival in elderly NSCLC patients using oxidative stress markers.

Front Oncol. 2024 Oct 24;14:1482374. doi: 10.3389/fonc.2024.1482374. eCollection 2024.

Decoding machine learning in nursing research: A scoping review of effective algorithms.

J Nurs Scholarsh. 2025 Jan;57(1):119-129. doi: 10.1111/jnu.13026. Epub 2024 Sep 18.

Predictive value of the random forest model based on bioelectrical impedance analysis parameter trajectories for short-term prognosis in stroke patients.

Eur J Med Res. 2024 Jul 24;29(1):382. doi: 10.1186/s40001-024-01964-8.

From plan to delivery: Machine learning based positional accuracy prediction of multi-leaf collimator and estimation of delivery effect in volumetric modulated arc therapy.

J Appl Clin Med Phys. 2024 Sep;25(9):e14437. doi: 10.1002/acm2.14437. Epub 2024 Jun 20.

Development and validation of machine learning models to predict MDRO colonization or infection on ICU admission by using electronic health record data.

Antimicrob Resist Infect Control. 2024 Jul 6;13(1):74. doi: 10.1186/s13756-024-01428-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多中心随机森林模型在协作临床研究网络中的有效预后预测。

A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.

机构信息

出版信息

BACKGROUND

METHODS AND MATERIALS

RESULT

CONCLUSION

背景

方法和材料

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献