基于同态加密的隐私保护癌症类型预测。

Privacy-preserving cancer type prediction with homomorphic encryption.

机构信息

Tandon School of Engineering, New York University, Brooklyn, NY, 11201, USA.

Center for Cyber Security, New York University Abu Dhabi, Abu Dhabi, 129188, UAE.

出版信息

Sci Rep. 2023 Jan 30;13(1):1661. doi: 10.1038/s41598-023-28481-8.

DOI:10.1038/s41598-023-28481-8

PMID:36717667

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9886900/

Abstract

Cancer genomics tailors diagnosis and treatment based on an individual's genetic information and is the crux of precision medicine. However, analysis and maintenance of high volume of genetic mutation data to build a machine learning (ML) model to predict the cancer type is a computationally expensive task and is often outsourced to powerful cloud servers, raising critical privacy concerns for patients' data. Homomorphic encryption (HE) enables computation on encrypted data, thus, providing cryptographic guarantees to protect privacy. But restrictive overheads of encrypted computation deter its usage. In this work, we explore the challenges of privacy preserving cancer type prediction using a dataset consisting of more than 2 million genetic mutations from 2713 patients for several cancer types by building a highly accurate ML model and then implementing its privacy preserving version in HE. Our solution for cancer type inference encodes somatic mutations based on their impact on the cancer genomes into the feature space and then uses statistical tests for feature selection. We propose a fast matrix multiplication algorithm for HE-based model. Our final model achieves 0.98 micro-average area under curve improving accuracy from 70.08 to 83.61% , being 550 times faster than the standard matrix multiplication-based privacy-preserving models. Our tool can be found at https://github.com/momalab/octal-candet .

摘要

癌症基因组学根据个体的遗传信息来量身定制诊断和治疗方案，是精准医疗的核心。然而，分析和维护大量的基因突变数据以构建机器学习 (ML) 模型来预测癌症类型是一项计算成本很高的任务，通常外包给功能强大的云服务器，这引发了患者数据的重大隐私问题。同态加密 (HE) 可以对加密数据进行计算，从而为保护隐私提供密码学保证。但是，加密计算的限制开销阻碍了其使用。在这项工作中，我们通过构建一个高度准确的 ML 模型来探索使用包含来自 2713 名患者的超过 200 万种基因突变的数据集进行隐私保护的癌症类型预测的挑战，然后在 HE 中实现其隐私保护版本。我们用于癌症类型推断的解决方案基于它们对癌症基因组的影响将体细胞突变编码到特征空间中，然后使用统计检验进行特征选择。我们为基于 HE 的模型提出了一种快速矩阵乘法算法。我们的最终模型在微平均 AUC 上达到 0.98，将准确率从 70.08%提高到 83.61%，比基于标准矩阵乘法的隐私保护模型快 550 倍。我们的工具可以在 https://github.com/momalab/octal-candet 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44e2/9886900/4d64876ccd3a/41598_2023_28481_Fig1_HTML.jpg

相似文献

Privacy-preserving cancer type prediction with homomorphic encryption.基于同态加密的隐私保护癌症类型预测。

Sci Rep. 2023 Jan 30;13(1):1661. doi: 10.1038/s41598-023-28481-8.

Privacy-preserving approximate GWAS computation based on homomorphic encryption.基于同态加密的隐私保护近似 GWAS 计算。

BMC Med Genomics. 2020 Jul 21;13(Suppl 7):77. doi: 10.1186/s12920-020-0722-1.

Secure tumor classification by shallow neural network using homomorphic encryption.利用同态加密实现浅层神经网络的肿瘤分类安全。

BMC Genomics. 2022 Apr 9;23(1):284. doi: 10.1186/s12864-022-08469-w.

Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data.基于同态加密基因型数据的逻辑回归和线性回归的隐私保护模型评估。

J Biomed Inform. 2024 Aug;156:104678. doi: 10.1016/j.jbi.2024.104678. Epub 2024 Jun 25.

Private queries on encrypted genomic data.关于加密基因组数据的私密查询

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):45. doi: 10.1186/s12920-017-0276-z.

Preserving Health Care Data Security and Privacy Using Carmichael's Theorem-Based Homomorphic Encryption and Modified Enhanced Homomorphic Encryption Schemes in Edge Computing Systems.利用基于 Carmichael 定理的同态加密和改进的增强同态加密方案在边缘计算系统中保护医疗保健数据的安全性和隐私性。

Big Data. 2022 Feb;10(1):1-17. doi: 10.1089/big.2021.0012. Epub 2021 Aug 10.

Finding Highly Similar Regions of Genomic Sequences Through Homomorphic Encryption.通过同态加密查找基因组序列的高度相似区域。

J Comput Biol. 2024 Mar;31(3):197-212. doi: 10.1089/cmb.2023.0050.

Privacy-preserving breast cancer recurrence prediction based on homomorphic encryption and secure two party computation.基于同态加密和安全两方计算的隐私保护乳腺癌复发预测。

PLoS One. 2021 Dec 20;16(12):e0260681. doi: 10.1371/journal.pone.0260681. eCollection 2021.

Privacy-preserving logistic regression training.隐私保护的逻辑回归训练。

BMC Med Genomics. 2018 Oct 11;11(Suppl 4):86. doi: 10.1186/s12920-018-0398-y.

BLOOM: BLoom filter based oblivious outsourced matchings.布隆：基于布隆过滤器的不经意外包匹配

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):44. doi: 10.1186/s12920-017-0277-y.

引用本文的文献

Private pathological assessment via machine learning and homomorphic encryption.通过机器学习和同态加密进行的私密病理评估。

BioData Min. 2024 Sep 10;17(1):33. doi: 10.1186/s13040-024-00379-9.

Adaptive Autonomous Protocol for Secured Remote Healthcare Using Fully Homomorphic Encryption (AutoPro-RHC).自适应自主协议，用于使用全同态加密（AutoPro-RHC）进行安全远程医疗保健。

Sensors (Basel). 2023 Oct 16;23(20):8504. doi: 10.3390/s23208504.

Session Introduction: TOWARDS ETHICAL BIOMEDICAL INFORMATICS: LEARNING FROM OLELO NOEAU, HAWAIIAN PROVERBS.会议介绍：迈向伦理生物医学信息学：从夏威夷谚语 OLELO NOEAU 中学习。

Pac Symp Biocomput. 2023;28:461-471.

Multiclass Cancer Prediction Based on Copy Number Variation Using Deep Learning.基于深度学习的拷贝数变异的多癌症预测。

Comput Intell Neurosci. 2022 Jun 9;2022:4742986. doi: 10.1155/2022/4742986. eCollection 2022.

本文引用的文献

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons.针对基因组数据共享信标的基因组重建攻击。

Proc Priv Enhanc Technol. 2021;2021(3):28-48. doi: 10.2478/popets-2021-0036. Epub 2021 Apr 26.

Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption.使用机器学习和部分同态加密实现快速且可扩展的私密基因型插补

IEEE Access. 2021;9:93097-93110. doi: 10.1109/access.2021.3093005. Epub 2021 Jun 28.

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation.超快速同态加密模型实现了基因分型插补的安全外包。

Cell Syst. 2021 Nov 17;12(11):1108-1120.e4. doi: 10.1016/j.cels.2021.07.010. Epub 2021 Aug 30.

Precision medicine in 2030-seven ways to transform healthcare.2030 年的精准医学——改变医疗的七种方式。

Cell. 2021 Mar 18;184(6):1415-1419. doi: 10.1016/j.cell.2021.01.015.

Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences.超过 2500 个癌症基因组中的乘客突变：整体分子功能影响和后果。

Cell. 2020 Mar 5;180(5):915-927.e16. doi: 10.1016/j.cell.2020.01.032. Epub 2020 Feb 20.

Recurrent somatic mutations reveal new insights into consequences of mutagenic processes in cancer.反复出现的体细胞突变揭示了诱变过程在癌症中的后果的新见解。

PLoS Comput Biol. 2019 Nov 25;15(11):e1007496. doi: 10.1371/journal.pcbi.1007496. eCollection 2019 Nov.

Identification of 12 cancer types through genome deep learning.通过基因组深度学习鉴定 12 种癌症类型。

Sci Rep. 2019 Nov 21;9(1):17256. doi: 10.1038/s41598-019-53989-3.

Artificial intelligence in healthcare.人工智能在医疗保健领域的应用。

Nat Biomed Eng. 2018 Oct;2(10):719-731. doi: 10.1038/s41551-018-0305-z. Epub 2018 Oct 10.

Association analysis using somatic mutations.基于体细胞突变的关联分析。

PLoS Genet. 2018 Nov 2;14(11):e1007746. doi: 10.1371/journal.pgen.1007746. eCollection 2018 Nov.

An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics.TCGA 泛癌临床数据资源整合，推动高质量生存预后分析。

Cell. 2018 Apr 5;173(2):400-416.e11. doi: 10.1016/j.cell.2018.02.052.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于同态加密的隐私保护癌症类型预测。

Privacy-preserving cancer type prediction with homomorphic encryption.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献