一种用于从基因组序列中识别新感染的隐私保护可靠深度联邦学习模型。

A privacy-preserving dependable deep federated learning model for identifying new infections from genome sequences.

作者信息

Mehedi Sk Tanzir, Abdulrazak Lway Faisal, Ahmed Kawsar, Uddin Muhammad Shahin, Bui Francis M, Chen Li, Moni Mohammad Ali, Al-Zahrani Fahad Ahmed

机构信息

Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh.

Electrical Engineering Technical College, Middle Technical University, Baghdad, Iraq.

出版信息

Sci Rep. 2025 Mar 1;15(1):7291. doi: 10.1038/s41598-025-89612-x.

DOI:10.1038/s41598-025-89612-x

PMID:40025035

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11873272/

Abstract

The traditional molecular-based identification (TMID) technique of new infections from genome sequences (GSs) has made significant contributions so far. However, due to the sensitive nature of the medical data, the TMID technique of transferring the patient's data to the central machine or server may create severe privacy and security issues. In recent years, the progression of deep federated learning (DFL) and its remarkable success in many domains has guided as a potential solution in this field. Therefore, we proposed a dependable and privacy-preserving DFL-based identification model of new infections from GSs. The unique contributions include automatic effective feature selection, which is best suited for identifying new infections, designing a dependable and privacy-preserving DFL-based LeNet model, and evaluating real-world data. To this end, a comprehensive experimental performance evaluation has been conducted. Our proposed model has an overall accuracy of 99.12% after independently and identically distributing the dataset among six clients. Moreover, the proposed model has a precision of 98.23%, recall of 98.04%, f1-score of 96.24%, Cohen's kappa of 83.94%, and ROC AUC of 98.24% for the same configuration, which is a noticeable improvement when compared to the other benchmark models. The proposed dependable model, along with empirical results, is encouraging enough to recognize as an alternative for identifying new infections from other virus strains by ensuring proper privacy and security of patients' data.

摘要

到目前为止，基于传统分子的基因组序列（GSs）新感染鉴定（TMID）技术已经做出了重大贡献。然而，由于医学数据的敏感性，将患者数据传输到中央机器或服务器的TMID技术可能会引发严重的隐私和安全问题。近年来，深度联邦学习（DFL）的发展及其在许多领域的显著成功为该领域提供了一种潜在的解决方案。因此，我们提出了一种基于DFL的可靠且保护隐私的GSs新感染鉴定模型。独特贡献包括自动有效特征选择，这最适合用于鉴定新感染；设计基于DFL的可靠且保护隐私的LeNet模型；以及评估真实世界数据。为此，进行了全面的实验性能评估。在将数据集独立同分布到六个客户端后，我们提出的模型总体准确率为99.12%。此外，对于相同配置，该模型的精确率为98.23%，召回率为98.04%，F1分数为96.24%，科恩卡方系数为83.94%，ROC曲线下面积为98.24%，与其他基准模型相比有显著提升。所提出的可靠模型以及实证结果足以令人鼓舞，有望成为通过确保患者数据的适当隐私和安全来鉴定其他病毒株新感染的替代方法。