Suppr超能文献

通过加密基因型回归,在 54092 个 GWAS 样本中搜索跨队列亲属。

Searching across-cohort relatives in 54,092 GWAS samples via encrypted genotype regression.

机构信息

Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China.

Center for Reproductive Medicine, Department of Genetic and Genomic Medicine, and Clinical Research Institute, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, Zhejiang, China.

出版信息

PLoS Genet. 2024 Jan 11;20(1):e1011037. doi: 10.1371/journal.pgen.1011037. eCollection 2024 Jan.

Abstract

Explicitly sharing individual level data in genomics studies has many merits comparing to sharing summary statistics, including more strict QCs, common statistical analyses, relative identification and improved statistical power in GWAS, but it is hampered by privacy or ethical constraints. In this study, we developed encG-reg, a regression approach that can detect relatives of various degrees based on encrypted genomic data, which is immune of ethical constraints. The encryption properties of encG-reg are based on the random matrix theory by masking the original genotypic matrix without sacrificing precision of individual-level genotype data. We established a connection between the dimension of a random matrix, which masked genotype matrices, and the required precision of a study for encrypted genotype data. encG-reg has false positive and false negative rates equivalent to sharing original individual level data, and is computationally efficient when searching relatives. We split the UK Biobank into their respective centers, and then encrypted the genotype data. We observed that the relatives estimated using encG-reg was equivalently accurate with the estimation by KING, which is a widely used software but requires original genotype data. In a more complex application, we launched a finely devised multi-center collaboration across 5 research institutes in China, covering 9 cohorts of 54,092 GWAS samples. encG-reg again identified true relatives existing across the cohorts with even different ethnic backgrounds and genotypic qualities. Our study clearly demonstrates that encrypted genomic data can be used for data sharing without loss of information or data sharing barrier.

摘要

在基因组学研究中,明确共享个体水平数据相对于共享汇总统计数据具有许多优点,包括更严格的 QC、通用的统计分析、相对识别和 GWAS 中的统计功效提高,但受到隐私或伦理限制的阻碍。在这项研究中,我们开发了 encG-reg,这是一种基于加密基因组数据的回归方法,可以检测各种程度的亲属,不受伦理限制。encG-reg 的加密特性基于随机矩阵理论,通过屏蔽原始基因型矩阵而不牺牲个体水平基因型数据的精度来实现。我们建立了一个随机矩阵的维度与研究加密基因型数据所需精度之间的联系,该矩阵屏蔽了基因型矩阵。encG-reg 的假阳性和假阴性率与共享原始个体水平数据相当,并且在搜索亲属时计算效率高。我们将英国生物银行(UK Biobank)分割到各自的中心,然后对基因型数据进行加密。我们观察到,使用 encG-reg 估计的亲属与广泛使用的软件 KING 的估计相当准确,但 KING 需要原始基因型数据。在更复杂的应用中,我们在中国的 5 个研究所之间发起了一项精心设计的多中心合作,涵盖了 9 个队列的 54092 个 GWAS 样本。encG-reg 再次在不同的种族背景和基因型质量的队列中识别出真实的亲属。我们的研究清楚地表明,加密的基因组数据可用于数据共享,而不会丢失信息或数据共享障碍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2597/10783776/253ff258bf01/pgen.1011037.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验