Suppr超能文献

FedSPL:用于保护隐私的疾病诊断的联邦自步学习。

FedSPL: federated self-paced learning for privacy-preserving disease diagnosis.

机构信息

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, China.

出版信息

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab498.

Abstract

The growing expansion of data availability in medical fields could help improve the performance of machine learning methods. However, with healthcare data, using multi-institutional datasets is challenging due to privacy and security concerns. Therefore, privacy-preserving machine learning methods are required. Thus, we use a federated learning model to train a shared global model, which is a central server that does not contain private data, and all clients maintain the sensitive data in their own institutions. The scattered training data are connected to improve model performance, while preserving data privacy. However, in the federated training procedure, data errors or noise can reduce learning performance. Therefore, we introduce the self-paced learning, which can effectively select high-confidence samples and drop high noisy samples to improve the performances of the training model and reduce the risk of data privacy leakage. We propose the federated self-paced learning (FedSPL), which combines the advantage of federated learning and self-paced learning. The proposed FedSPL model was evaluated on gene expression data distributed across different institutions where the privacy concerns must be considered. The results demonstrate that the proposed FedSPL model is secure, i.e. it does not expose the original record to other parties, and the computational overhead during training is acceptable. Compared with learning methods based on the local data of all parties, the proposed model can significantly improve the predicted F1-score by approximately 4.3%. We believe that the proposed method has the potential to benefit clinicians in gene selections and disease prognosis.

摘要

医疗领域中数据可用性的不断扩展可以帮助提高机器学习方法的性能。然而,由于隐私和安全问题,使用多机构数据集是具有挑战性的。因此,需要使用隐私保护机器学习方法。因此,我们使用联邦学习模型来训练共享的全局模型,该模型是一个不包含私人数据的中央服务器,而所有客户端都在自己的机构中维护敏感数据。通过连接分散的训练数据来提高模型性能,同时保护数据隐私。然而,在联邦训练过程中,数据错误或噪声可能会降低学习性能。因此,我们引入了自步学习,它可以有效地选择高置信度的样本并丢弃高噪声的样本,从而提高训练模型的性能并降低数据隐私泄露的风险。我们提出了联邦自步学习(FedSPL),它结合了联邦学习和自步学习的优势。在涉及隐私问题的不同机构分布的基因表达数据上评估了所提出的 FedSPL 模型。结果表明,所提出的 FedSPL 模型是安全的,即它不会将原始记录暴露给其他方,并且训练期间的计算开销是可以接受的。与基于各方本地数据的学习方法相比,所提出的模型可以显著提高预测的 F1 分数约 4.3%。我们相信,所提出的方法有可能使临床医生在基因选择和疾病预后方面受益。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验