Suppr超能文献

联邦深度学习助力基于蛋白质组学的癌症亚型分类。

Federated Deep Learning Enables Cancer Subtyping by Proteomics.

作者信息

Cai Zhaoxiang, Boys Emma L, Noor Zainab, Aref Adel T, Xavier Dylan, Lucas Natasha, Williams Steven G, Koh Jennifer M S, Poulos Rebecca C, Wu Yangxiu, Dausmann Michael, MacKenzie Karen L, Aguilar-Mahecha Adriana, Armengol Carolina, Barranco Maria M, Basik Mark, Bowman Elise D, Clifton-Bligh Roderick, Connolly Elizabeth A, Cooper Wendy A, Dalal Bhavik, DeFazio Anna, Filipits Martin, Flynn Peter J, Graham J Dinny, George Jacob, Gill Anthony J, Gnant Michael, Habib Rosemary, Harris Curtis C, Harvey Kate, Horvath Lisa G, Jackson Christopher, Kohonen-Corish Maija R J, Lim Elgene, Liu Jia Jenny, Long Georgina V, Lord Reginald V, Mann Graham J, McCaughan Geoffrey W, Morgan Lucy, Murphy Leigh, Nagabushan Sumanth, Nagrial Adnan, Navinés Jordi, Panizza Benedict J, Samra Jaswinder S, Scolyer Richard A, Souglakos John, Swarbrick Alexander, Thomas David, Balleine Rosemary L, Hains Peter G, Robinson Phillip J, Zhong Qing, Reddel Roger R

机构信息

ProCan, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, Australia.

Lady Davis Institute at the Jewish General Hospital, McGill University, Montreal, Canada.

出版信息

Cancer Discov. 2025 Sep 4;15(9):1803-1818. doi: 10.1158/2159-8290.CD-24-1488.

Abstract

UNLABELLED

Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a federated deep learning approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n = 1,260) and 29 cohorts held behind private firewalls (n = 6,265), representing 19,930 replicate data-independent acquisition mass spectrometry runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n = 625) in 14 cancer subtyping tasks compared with local models and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external, data-independent acquisition mass spectrometry cohorts (n = 55) and eight acquired by tandem mass tag proteomics (n = 832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, for example, for discovering predictive biomarkers or treatment targets while maintaining data privacy.

SIGNIFICANCE

A federated deep learning approach applied to human proteomic data, acquired using two distinct proteomic technologies from 40 tumor cohorts across eight countries, enabled accurate cancer histopathologic subtyping while preserving data privacy. This approach will enable the privacy-compliant development of large-scale proteomic artificial intelligence models, including foundation models, across institutions globally.

摘要

未标注

生物医学中的人工智能应用面临着数据隐私要求带来的重大挑战。为了解决临床注释组织蛋白质组学数据的这一问题,我们开发了一种联邦深度学习方法(ProCanFDL),在包含泛癌队列数据(n = 1260)和29个位于私有防火墙后的队列数据(n = 6265)的模拟站点上训练局部模型,这些数据代表了19930次独立于数据采集的质谱运行。局部参数更新被汇总以构建全局模型,在14个癌症亚型分类任务的留出测试集(n = 625)上,与局部模型相比性能提高了43%,并与集中式模型性能相当。通过使用来自两个外部的、独立于数据采集的质谱队列(n = 55)和通过串联质量标签蛋白质组学获得的八个队列(n = 832)的数据对全局模型进行重新训练,证明了该方法的通用性。ProCanFDL为使用蛋白质组学数据的国际合作机器学习计划提供了一种解决方案,例如,在保持数据隐私的同时发现预测性生物标志物或治疗靶点。

意义

一种联邦深度学习方法应用于人类蛋白质组学数据,这些数据是使用两种不同的蛋白质组学技术从八个国家的40个肿瘤队列中获取的,在保护数据隐私的同时实现了准确的癌症组织病理学亚型分类。这种方法将使全球各机构能够在符合隐私要求的情况下开发大规模蛋白质组学人工智能模型,包括基础模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验