Chai Hua, Huang Yiqian, Xu Lekai, Song Xinpeng, He Minfan, Wang Qingyong
School of Mathematics and Big Data, Foshan University, Foshan, 528000, China.
School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
Heliyon. 2024 May 23;10(11):e31873. doi: 10.1016/j.heliyon.2024.e31873. eCollection 2024 Jun 15.
Survival prediction is one of the crucial goals in precision medicine, as accurate survival assessment can aid physicians in selecting appropriate treatment for individual patients. To achieve this aim, extensive data must be utilized to train the prediction model and prevent overfitting. However, the collection of patient data for disease prediction is challenging due to potential variations in data sources across institutions and concerns regarding privacy and ownership issues in data sharing. To facilitate the integration of cancer data from different institutions without violating privacy laws, we developed a federated learning-based data integration framework called AdFed, which can be used to evaluate patients' survival while considering the privacy protection problem by utilizing the decentralized federated learning technology and regularization method.
AdFed was tested on different cancer datasets that contain the patients' information from different institutions. The experimental results show that AdFed using distributed data can achieve better performance in cancer survival prediction (AUC = 0.605) than the compared federated-learning-based methods (average AUC = 0.554). Additionally, to assess the biological interpretability of our method, in the case study we list 10 identified genes related to liver cancer selected by AdFed, among which 5 genes have been proved by literature review.
The results indicate that AdFed outperforms better than other federated-learning-based methods, and the interpretable algorithm can select biologically significant genes and pathways while ensuring the confidentiality and integrity of data.
生存预测是精准医学的关键目标之一,因为准确的生存评估有助于医生为个体患者选择合适的治疗方法。为实现这一目标,必须利用大量数据来训练预测模型并防止过拟合。然而,由于各机构数据源的潜在差异以及数据共享中对隐私和所有权问题的担忧,收集用于疾病预测的患者数据具有挑战性。为了在不违反隐私法的情况下促进来自不同机构的癌症数据整合,我们开发了一种基于联邦学习的数据整合框架AdFed,该框架可通过利用分散式联邦学习技术和正则化方法,在考虑隐私保护问题的同时用于评估患者的生存情况。
AdFed在包含不同机构患者信息的不同癌症数据集上进行了测试。实验结果表明,使用分布式数据的AdFed在癌症生存预测方面(AUC = 0.605)比基于联邦学习的比较方法(平均AUC = 0.554)具有更好的性能。此外,为了评估我们方法的生物学可解释性,在案例研究中我们列出了AdFed选择的10个与肝癌相关的已识别基因,其中5个基因已通过文献综述得到证实。
结果表明,AdFed的性能优于其他基于联邦学习的方法,并且该可解释算法在确保数据保密性和完整性的同时,可以选择具有生物学意义的基因和途径。