利用多中心协作研究网络的医院特有数据提高有限患者数据医疗机构的预测能力。

Improving prediction for medical institution with limited patient data: Leveraging hospital-specific data based on multicenter collaborative research network.

机构信息

Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.

Department of Surgical Oncology, Second affiliated hospital, Zhejiang University School of Medicine, Hangzhou, China.

出版信息

Artif Intell Med. 2021 Mar;113:102024. doi: 10.1016/j.artmed.2021.102024. Epub 2021 Jan 23.

DOI:10.1016/j.artmed.2021.102024

PMID:33685587

Abstract

BACKGROUND AND OBJECTIVE

Clinical decision support assisted by prediction models usually faces the challenges of limited clinical data and a lack of labels when the model is developed with data from a single medical institution. Accordingly, research on multicenter clinical collaborative networks, which can provide external medical data, has received increasing attention. With the increasing availability of machine learning techniques such as transfer learning, leveraging large-scale patient data from multiple hospitals to build data-driven predictive models with clinical application potential provides an alternative solution to address the problem of limited patient data.

METHODS

A multicenter hybrid semi-supervised transfer learning model (MHSTL) is proposed in this study on the basis of unified common data model to ensure multicenter data standardized representation. Then the hospital-specific features, along with the co-occurrence features across domains, are aligned through a representation learning architecture that is built based on deep neural networks and the newly proposed neural decision forest model. In this process, limited patient data from the target hospital, both labeled and unlabeled, are incorporated during the feature adaptation process, thereby contributing to better model performance. Without patient-level data sharing, the proposed model learning strategy which overcomes feature misalignment and distribution divergence, enables the multi-source transfer learning process in the case of insufficient and unlabeled patient data at target hospital.

RESULTS

The effectiveness of the proposed transfer learning model was evaluated on a collaborative research network of colorectal cancer patients in the US and China. The results demonstrate that the proposed model can achieve much better performance for predicting target risk with limited resources on patient data than baseline models      . Better discrimination and calibration ability are also observed when sufficient labeled data are not available in the target hospital for prognosis prediction tasks      . Further exploratory experiments show that the proposed approach exhibits good model generalizability regardless of the data heterogeneity. With the help of the SHapley Additive exPlanations for model interpretation, the effectiveness of incorporating hospital-specific features in the transfer learning model is shown.

CONCLUSIONS

In this study, the proposed method can develop prediction models from multiple source hospitals and exhibit good performance by leveraging cross-domain hospital-specific feature information, therefore enhancing the model prediction when applied to single medical institution with limited patient data.

摘要

背景与目的

临床决策支持通常会受到模型开发时所使用的单一医疗机构数据有限且缺乏标签的挑战。因此，研究多中心临床协作网络，以提供外部医疗数据，受到了越来越多的关注。随着迁移学习等机器学习技术的日益普及，可以利用来自多个医院的大规模患者数据来构建具有临床应用潜力的数据驱动预测模型，这为解决患者数据有限的问题提供了另一种解决方案。

方法

本研究在统一通用数据模型的基础上提出了一种多中心混合半监督迁移学习模型（MHSTL），以确保多中心数据的标准化表示。然后，通过基于深度神经网络和新提出的神经决策森林模型构建的表示学习架构，对齐医院特有的特征以及跨域的共同出现特征。在这个过程中，在特征适配过程中纳入目标医院的有限患者数据（包括有标签和无标签数据），从而提高模型性能。在不共享患者级数据的情况下，所提出的模型学习策略克服了特征失配和分布发散问题，从而实现了在目标医院患者数据不足和无标签的情况下进行多源迁移学习过程。

结果

该迁移学习模型的有效性在中国和美国的结直肠癌患者协作研究网络上进行了评估。结果表明，与基线模型相比，该模型在利用有限的患者数据资源预测目标风险时可以实现更好的性能。在目标医院没有足够的有标签数据进行预后预测任务时，也观察到更好的区分和校准能力。进一步的探索性实验表明，无论数据异质性如何，所提出的方法都表现出良好的模型泛化能力。借助模型解释的 Shapley Additive exPlanations（SHAP）方法，证明了在迁移学习模型中纳入医院特有特征的有效性。

结论

本研究提出的方法可以从多个源医院开发预测模型，并通过利用跨域医院特有特征信息来提高模型性能，从而在应用于患者数据有限的单一医疗机构时增强模型预测能力。

相似文献

Improving prediction for medical institution with limited patient data: Leveraging hospital-specific data based on multicenter collaborative research network.利用多中心协作研究网络的医院特有数据提高有限患者数据医疗机构的预测能力。

Artif Intell Med. 2021 Mar;113:102024. doi: 10.1016/j.artmed.2021.102024. Epub 2021 Jan 23.

A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.多中心随机森林模型在协作临床研究网络中的有效预后预测。

Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.

Establishment and evaluation of a multicenter collaborative prediction model construction framework supporting model generalization and continuous improvement: A pilot study.建立和评估一个支持模型推广和持续改进的多中心协作预测模型构建框架：一项试点研究。

Int J Med Inform. 2020 Sep;141:104173. doi: 10.1016/j.ijmedinf.2020.104173. Epub 2020 May 30.

Multi-class motor imagery EEG classification using collaborative representation-based semi-supervised extreme learning machine.基于协同表示的半监督极限学习机的多类运动想象 EEG 分类。

Med Biol Eng Comput. 2020 Sep;58(9):2119-2130. doi: 10.1007/s11517-020-02227-4. Epub 2020 Jul 16.

Transferability of artificial neural networks for clinical document classification across hospitals: A case study on abnormality detection from radiology reports.医院间临床文档分类的人工神经网络可转移性：以放射学报告异常检测为例的研究。

J Biomed Inform. 2018 Sep;85:68-79. doi: 10.1016/j.jbi.2018.07.017. Epub 2018 Jul 17.

POPCORN: A web service for individual PrognOsis prediction based on multi-center clinical data CollabORatioN without patient-level data sharing.爆米花：一个基于多中心临床数据协作而无需患者级数据共享的个体预后预测的网络服务。

J Biomed Inform. 2018 Oct;86:1-14. doi: 10.1016/j.jbi.2018.08.008. Epub 2018 Aug 10.

A transfer learning model with multi-source domains for biomedical event trigger extraction.一种用于生物医学事件触发词提取的多源域迁移学习模型。

BMC Genomics. 2021 Jan 7;22(1):31. doi: 10.1186/s12864-020-07315-1.

Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals.利用各医院的标记和未标记数据进行临床文档分类

AMIA Annu Symp Proc. 2018 Dec 5;2018:545-554. eCollection 2018.

A generalized AI method for pathology cancer diagnosis and prognosis prediction based on transfer learning and hierarchical split.基于迁移学习和层次分裂的病理癌症诊断和预后预测通用 AI 方法。

Phys Med Biol. 2023 Aug 29;68(17). doi: 10.1088/1361-6560/aced34.

Learning image features with fewer labels using a semi-supervised deep convolutional network.使用半监督深度卷积网络学习具有较少标签的图像特征。

Neural Netw. 2020 Dec;132:131-143. doi: 10.1016/j.neunet.2020.08.016. Epub 2020 Aug 25.

引用本文的文献

Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Structured Data Analysis.弥合医疗保健领域的数据差距：结构化数据分析中迁移学习的范围综述

Health Data Sci. 2025 Sep 3;5:0321. doi: 10.34133/hds.0321. eCollection 2025.

Predicting liver metastasis in colorectal cancer patients using routine biochemical tests enhanced by machine learning.利用机器学习增强的常规生化检测预测结直肠癌患者的肝转移

Clin Transl Oncol. 2025 Jul 17. doi: 10.1007/s12094-025-03996-w.

Enhancing Patient Selection in Sepsis Clinical Trials Design Through an AI Enrichment Strategy: Algorithm Development and Validation.通过人工智能富集策略增强脓毒症临床试验设计中的患者选择：算法的开发和验证。

J Med Internet Res. 2024 Sep 4;26:e54621. doi: 10.2196/54621.

Intelligent oncology: The convergence of artificial intelligence and oncology.智能肿瘤学：人工智能与肿瘤学的融合。

J Natl Cancer Cent. 2022 Dec 5;3(1):83-91. doi: 10.1016/j.jncc.2022.11.004. eCollection 2023 Mar.

OMOP CDM Can Facilitate Data-Driven Studies for Cancer Prediction: A Systematic Review.OMOP CDM 有助于癌症预测的数据分析研究：系统综述。

Int J Mol Sci. 2022 Oct 5;23(19):11834. doi: 10.3390/ijms231911834.

Study on Ultrasonic Imaging of Nursing Care for Preventing and Treating Clinical Infection of Hemodialysis Patients Based on Smart Medical Big Data.基于智慧医疗大数据的超声成像在预防和治疗血液透析患者临床感染护理中的研究。

Contrast Media Mol Imaging. 2021 Dec 6;2021:2551063. doi: 10.1155/2021/2551063. eCollection 2021.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用多中心协作研究网络的医院特有数据提高有限患者数据医疗机构的预测能力。

Improving prediction for medical institution with limited patient data: Leveraging hospital-specific data based on multicenter collaborative research network.

机构信息

出版信息

BACKGROUND AND OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景与目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献