Zhang Wentai, Wu Xueyang, Wang He, Wu Ruopei, Deng Congcong, Xu Qian, Liu Xiaohai, Bai Xuexue, Yang Shuangjian, Li Xiaoxu, Feng Ming, Yang Qiang, Wang Renzhi
Department of Thoracic Surgery, Peking University First Hospital, Beijing, China.
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
World Neurosurg. 2025 Jan;193:1036-1046. doi: 10.1016/j.wneu.2024.10.091. Epub 2024 Nov 20.
Decentralized federated learning (DFL) may serve as a useful framework for machine learning (ML) tasks in multicentered studies, maximizing the use of clinical data without data sharing. We aim to propose the first workflow of DFL for ML tasks in multicentered studies, which can be as powerful as those using centralized data.
A DFL workflow was developed with 4 steps: registration, local computation, model update, and inspection. A total of 598 participants with acromegaly from Peking Union Medical College Hospital, and 120 participants from Xuanwu Hospital were enrolled. The cohort from Peking Union Medical College Hospital was further split into 5 centers. Nine clinical features were incorporated into ML-based models trained based on 4 algorithms: logistic regression (LR), gradient boosted decision tree, support vector machine (SVM), and deep neural network (DNN). The area under the curve of receiver operating characteristic curves was used to evaluate the performance of the models.
Models trained based on DFL workflow performed better than most models in LR (P < 0.05), all models in DNN, SVM, and gradient boosted decision tree (P < 0.05). Models trained on DFL workflow performed as powerful as models trained on centralized data in LR, DNN, and SVM (P > 0.05).
We demonstrate that the DFL workflow without data sharing should be a more appropriate method in ML tasks in multicentered studies. And the DFL workflow should be further exploited in clinical researches in other departments and it can encourage and facilitate multicentered studies.
去中心化联邦学习(DFL)可作为多中心研究中机器学习(ML)任务的有用框架,在不进行数据共享的情况下最大限度地利用临床数据。我们旨在提出多中心研究中用于ML任务的首个DFL工作流程,其效果可与使用集中式数据的工作流程相媲美。
开发了一个包含四个步骤的DFL工作流程:注册、本地计算、模型更新和检查。共纳入了来自北京协和医院的598例肢端肥大症患者以及来自宣武医院的120例患者。北京协和医院的队列进一步分为5个中心。将9种临床特征纳入基于4种算法训练的ML模型:逻辑回归(LR)、梯度提升决策树、支持向量机(SVM)和深度神经网络(DNN)。采用受试者操作特征曲线下面积来评估模型的性能。
基于DFL工作流程训练的模型在LR方面比大多数模型表现更好(P < 0.05),在DNN、SVM和梯度提升决策树方面比所有模型表现更好(P < 0.05)。在LR、DNN和SVM方面,基于DFL工作流程训练的模型与基于集中式数据训练的模型表现相当(P > 0.05)。
我们证明,在多中心研究的ML任务中,不进行数据共享的DFL工作流程应是一种更合适的方法。并且DFL工作流程应在其他科室的临床研究中进一步探索,它可以鼓励和促进多中心研究。