• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据清洗以减少功能基因组学中的私人信息泄露。

Data Sanitization to Reduce Private Information Leakage from Functional Genomics.

机构信息

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.

Stanford University School of Medicine, Department of Genetics, Stanford, CA 94305, USA.

出版信息

Cell. 2020 Nov 12;183(4):905-917.e16. doi: 10.1016/j.cell.2020.09.036.

DOI:10.1016/j.cell.2020.09.036
PMID:33186529
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7672785/
Abstract

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.

摘要

功能基因组学数据集的产生正在蓬勃发展,因为它们提供了对基因调控和生物体表型的深入了解(例如,癌症中上调的基因)。功能基因组学实验的目的不一定是研究遗传变异,但由于它们使用下一代测序技术,因此引起了隐私问题。此外,由于广泛共享原始读取数据可以提高统计能力和研究的可重复性,因此存在广泛共享的强烈动机。因此,我们需要超越传统的受控访问模型的新共享模式。在这里,我们开发了一种数据净化程序,允许在最小化隐私泄露的情况下共享原始功能基因组学读数,从而实现有原则的隐私-效用权衡。我们的协议适用于传统的基于 Illumina 的测定和更新的技术,例如 10x 单细胞 RNA 测序。它涉及通过从统计上将研究参与者与已知个体联系起来来量化读取中的隐私泄露。我们使用来自高度准确的参考基因组和更现实的环境样本的数据进行了这些关联。

相似文献

1
Data Sanitization to Reduce Private Information Leakage from Functional Genomics.数据清洗以减少功能基因组学中的私人信息泄露。
Cell. 2020 Nov 12;183(4):905-917.e16. doi: 10.1016/j.cell.2020.09.036.
2
Functional genomics data: privacy risk assessment and technological mitigation.功能基因组学数据:隐私风险评估与技术缓解措施
Nat Rev Genet. 2022 Apr;23(4):245-258. doi: 10.1038/s41576-021-00428-7. Epub 2021 Nov 10.
3
Accurate filtering of privacy-sensitive information in raw genomic data.准确过滤原始基因组数据中的隐私敏感信息。
J Biomed Inform. 2018 Jun;82:1-12. doi: 10.1016/j.jbi.2018.04.006. Epub 2018 Apr 13.
4
Analysis of sensitive information leakage in functional genomics signal profiles through genomic deletions.通过基因组缺失分析功能基因组信号谱中的敏感信息泄露。
Nat Commun. 2018 Jun 22;9(1):2453. doi: 10.1038/s41467-018-04875-5.
5
Genomics and privacy: implications of the new reality of closed data for the field.基因组学与隐私:封闭数据的新现实对该领域的影响。
PLoS Comput Biol. 2011 Dec;7(12):e1002278. doi: 10.1371/journal.pcbi.1002278. Epub 2011 Dec 1.
6
DNA-SeAl: Sensitivity Levels to Optimize the Performance of Privacy-Preserving DNA Alignment.DNA-SeAl:优化隐私保护 DNA 比对性能的灵敏度水平。
IEEE J Biomed Health Inform. 2020 Mar;24(3):907-915. doi: 10.1109/JBHI.2019.2914952. Epub 2019 Jun 28.
7
FANCY: fast estimation of privacy risk in functional genomics data.FANCY:功能基因组学数据中隐私风险的快速评估。
Bioinformatics. 2021 Jan 29;36(21):5145-5150. doi: 10.1093/bioinformatics/btaa661.
8
Quantification of private information leakage from phenotype-genotype data: linking attacks.从表型-基因型数据中量化隐私信息泄露:链接攻击
Nat Methods. 2016 Mar;13(3):251-6. doi: 10.1038/nmeth.3746. Epub 2016 Feb 1.
9
Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics.私有基因组和公共单核苷酸多态性:用于共享数量遗传学的基因型和表型的同态加密。
Genetics. 2020 Jun;215(2):359-372. doi: 10.1534/genetics.120.303153. Epub 2020 Apr 23.
10
Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States.基因组隐私:美国面临的挑战、降低风险的技术方法及伦理考量
Ann N Y Acad Sci. 2017 Jan;1387(1):73-83. doi: 10.1111/nyas.13259. Epub 2016 Sep 28.

引用本文的文献

1
Secure and scalable gene expression quantification with pQuant.使用pQuant进行安全且可扩展的基因表达定量分析。
Nat Commun. 2025 Mar 10;16(1):2380. doi: 10.1038/s41467-025-57393-6.
2
Private information leakage from single-cell count matrices.单细胞计数矩阵中的隐私信息泄露。
Cell. 2024 Nov 14;187(23):6537-6549.e10. doi: 10.1016/j.cell.2024.09.012. Epub 2024 Oct 2.
3
Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data.基于同态加密基因型数据的逻辑回归和线性回归的隐私保护模型评估。

本文引用的文献

1
SMaSH: Sample matching using SNPs in humans.SMaSH:基于人类 SNP 进行样本匹配。
BMC Genomics. 2019 Dec 30;20(Suppl 12):1001. doi: 10.1186/s12864-019-6332-7.
2
A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis.一种概率多组学数据匹配方法,用于检测综合分析中的样本错误。
Gigascience. 2019 Jul 1;8(7). doi: 10.1093/gigascience/giz080.
3
Revealing the brain's molecular architecture.揭示大脑的分子结构。
J Biomed Inform. 2024 Aug;156:104678. doi: 10.1016/j.jbi.2024.104678. Epub 2024 Jun 25.
4
Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review.评估基因数据集的隐私漏洞:范围综述
JMIR Bioinform Biotechnol. 2024 May 27;5:e54332. doi: 10.2196/54332.
5
Astronaut omics and the impact of space on the human body at scale.航天组学与太空对人体的规模化影响。
Nat Commun. 2024 Jun 11;15(1):4952. doi: 10.1038/s41467-024-47237-0.
6
Omics Approaches to Investigate the Pathogenesis of Suicide.组学方法研究自杀的发病机制。
Biol Psychiatry. 2024 Dec 15;96(12):919-928. doi: 10.1016/j.biopsych.2024.05.017. Epub 2024 May 29.
7
Single-cell genomics and regulatory networks for 388 human brains.单细胞基因组学和 388 个人类大脑的调控网络。
Science. 2024 May 24;384(6698):eadi5199. doi: 10.1126/science.adi5199.
8
Single-cell genomics and regulatory networks for 388 human brains.388个人类大脑的单细胞基因组学与调控网络
bioRxiv. 2024 Mar 30:2024.03.18.585576. doi: 10.1101/2024.03.18.585576.
9
Assessing transcriptomic reidentification risks using discriminative sequence models.使用判别序列模型评估转录组再识别风险。
Genome Res. 2023 Jul;33(7):1101-1112. doi: 10.1101/gr.277699.123. Epub 2023 Aug 4.
10
Session Introduction: TOWARDS ETHICAL BIOMEDICAL INFORMATICS: LEARNING FROM OLELO NOEAU, HAWAIIAN PROVERBS.会议介绍:迈向伦理生物医学信息学:从夏威夷谚语 OLELO NOEAU 中学习。
Pac Symp Biocomput. 2023;28:461-471.
Science. 2018 Dec 14;362(6420):1262-1263. doi: 10.1126/science.362.6420.1262.
4
Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci.基于不相关法医和生物医学位点的亲属关联的统计检测。
Cell. 2018 Oct 18;175(3):848-858.e6. doi: 10.1016/j.cell.2018.09.008. Epub 2018 Oct 11.
5
Identity inference of genomic data using long-range familial searches.利用远程家族搜索推断基因组数据的身份信息。
Science. 2018 Nov 9;362(6415):690-694. doi: 10.1126/science.aau4832. Epub 2018 Oct 11.
6
A One-Penny Imputed Genome from Next-Generation Reference Panels.基于新一代参考面板的单分钱估算基因组。
Am J Hum Genet. 2018 Sep 6;103(3):338-348. doi: 10.1016/j.ajhg.2018.07.015. Epub 2018 Aug 9.
7
SCANPY: large-scale single-cell gene expression data analysis.SCANPY:大规模单细胞基因表达数据分析。
Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0.
8
NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.NGSCheckMate:用于在下一代测序研究中验证样本身份的数据类型内和跨数据类型的软件。
Nucleic Acids Res. 2017 Jun 20;45(11):e103. doi: 10.1093/nar/gkx193.
9
Are Data Sharing and Privacy Protection Mutually Exclusive?数据共享和隐私保护是否相互排斥?
Cell. 2016 Nov 17;167(5):1150-1154. doi: 10.1016/j.cell.2016.11.004.
10
deepTools2: a next generation web server for deep-sequencing data analysis.深度工具2:用于深度测序数据分析的下一代网络服务器。
Nucleic Acids Res. 2016 Jul 8;44(W1):W160-5. doi: 10.1093/nar/gkw257. Epub 2016 Apr 13.