Annan Richard, Noland Justin, Perkins Kamaria, Yuan Xiaohong, Roy Kaushik, Qingge Letu
Department of Computer Science, North Carolina A&T State University, 1601 East Market St, Greensboro, NC 27411 USA.
Discov Comput. 2025;28(1):108. doi: 10.1007/s10791-025-09627-w. Epub 2025 Jun 6.
The rapid advancements in sequencing technologies have greatly increased access to genomic data stored in public databases. This has raised significant privacy and security concerns. This review emphasizes the importance of protecting genomic data by analyzing vulnerabilities in current storage and sharing practices. It examines the risks genetic databases face from cyber-attacks and internal breaches, focusing especially on advanced AI-driven threats and quantum computing vulnerabilities. The review explores machine learning methods designed to secure data. It highlights algorithms that prioritize privacy while maintaining data confidentiality, such as differential privacy, federated learning, and synthetic data generation using Generative Adversarial Networks (GANs). Findings demonstrate progress in mitigating common privacy breaches like re-identification and inference attacks. However, persistent vulnerabilities remain, particularly to emerging threats such as model inversion and membership inference attacks. The review advocates an integrated approach combining robust legislative frameworks with advanced technology to address genomic privacy challenges. It calls for intensified research efforts to safeguard genomic information. In particular, there is an urgent need to adopt quantum-resistant cryptographic methods, including lattice-based encryption and blockchain-integrated security frameworks. The paper emphasizes the necessity for genomics researchers to prioritize data privacy and security. This ensures responsible handling of genomic information in research.
测序技术的快速发展极大地增加了获取存储在公共数据库中的基因组数据的机会。这引发了重大的隐私和安全问题。本综述强调了通过分析当前存储和共享实践中的漏洞来保护基因组数据的重要性。它研究了遗传数据库面临的来自网络攻击和内部违规行为的风险,特别关注先进的人工智能驱动的威胁和量子计算漏洞。该综述探讨了旨在保障数据安全的机器学习方法。它强调了在保持数据机密性的同时优先考虑隐私的算法,如差分隐私、联邦学习以及使用生成对抗网络(GANs)生成合成数据。研究结果表明在减轻诸如重新识别和推理攻击等常见隐私违规方面取得了进展。然而,持续存在的漏洞仍然存在,特别是对于诸如模型反转和成员推理攻击等新出现的威胁。该综述主张采用一种将强大的立法框架与先进技术相结合的综合方法来应对基因组隐私挑战。它呼吁加大研究力度以保护基因组信息。特别是,迫切需要采用抗量子密码方法,包括基于格的加密和区块链集成安全框架。本文强调了基因组学研究人员将数据隐私和安全置于优先地位的必要性。这确保了在研究中对基因组信息的负责任处理。