Wu Qiong, Reps Jenna M, Li Lu, Zhang Bingyu, Lu Yiwen, Tong Jiayi, Zhang Dazheng, Lumley Thomas, Brand Milou T, Van Zandt Mui, Falconer Thomas, He Xing, Huang Yu, Li Haoyang, Yan Chao, Tang Guojun, Williams Andrew E, Wang Fei, Bian Jiang, Malin Bradley, Hripcsak George, Schuemie Martijn J, Lu Yun, Drew Steve, Zhou Jiayu, Asch David A, Chen Yong
Department of Biostatistics and Health Data Science, University of Pittsburgh, Pittsburgh, PA, USA.
Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
NPJ Digit Med. 2025 Jul 15;8(1):442. doi: 10.1038/s41746-025-01781-1.
Clinical insights from real-world data often require aggregating information from institutions to ensure sufficient sample sizes and generalizability. However, patient privacy concerns only limit the sharing of patient-level data, and traditional federated learning algorithms, relying on extensive back-and-forth communications, can be inefficient to implement. We introduce the Collaborative One-shot Lossless Algorithm for Generalized Linear Models (COLA-GLM), a novel federated learning algorithm that supports diverse outcome types via generalized linear models and achieves results identical to a pooled patient-level data analysis (lossless) with only a single round of aggregated data exchange (one-shot). To further protect aggregated institutional data, we developed a secure extension, secure-COLA-GLM, utilizing homomorphic encryption. We demonstrated the effectiveness and lossless property of COLA-GLM through applications to an international influenza cohort and a decentralized U.S. COVID-19 mortality study. COLA-GLM and secure-COLA-GLM offer a scalable, efficient solution for decentralized collaborative learning involving multiple data partners and diverse security requirements.
来自真实世界数据的临床见解通常需要汇总来自各机构的信息,以确保足够的样本量和可推广性。然而,患者隐私问题仅限制了患者层面数据的共享,而传统的联邦学习算法依赖大量的来回通信,实施起来可能效率低下。我们引入了广义线性模型的协作一次性无损算法(COLA-GLM),这是一种新颖的联邦学习算法,它通过广义线性模型支持多种结果类型,并且仅通过一轮汇总数据交换(一次性)就能实现与汇总患者层面数据分析相同的结果(无损)。为了进一步保护汇总的机构数据,我们利用同态加密开发了一个安全扩展版本,即安全COLA-GLM。我们通过将其应用于一个国际流感队列和一项分散式美国新冠肺炎死亡率研究,证明了COLA-GLM的有效性和无损特性。COLA-GLM和安全COLA-GLM为涉及多个数据伙伴和不同安全要求的分散式协作学习提供了一种可扩展、高效的解决方案。