Suppr超能文献

GeneImp:利用超低覆盖度测序的基因型似然性对大型参考面板进行快速插补

GeneImp: Fast Imputation to Large Reference Panels Using Genotype Likelihoods from Ultralow Coverage Sequencing.

作者信息

Spiliopoulou Athina, Colombo Marco, Orchard Peter, Agakov Felix, McKeigue Paul

机构信息

Centre for Population Health Sciences, Usher Institute, University of Edinburgh, EH8 9AG, United Kingdom.

Pharmatics Ltd., Edinburgh, EH16 4UX, United Kingdom.

出版信息

Genetics. 2017 May;206(1):91-104. doi: 10.1534/genetics.117.200063. Epub 2017 Mar 27.

Abstract

We address the task of genotype imputation to a dense reference panel given genotype likelihoods computed from ultralow coverage sequencing as inputs. In this setting, the data have a high-level of missingness or uncertainty, and are thus more amenable to a probabilistic representation. Most existing imputation algorithms are not well suited for this situation, as they rely on prephasing for computational efficiency, and, without definite genotype calls, the prephasing task becomes computationally expensive. We describe GeneImp, a program for genotype imputation that does not require prephasing and is computationally tractable for whole-genome imputation. GeneImp does not explicitly model recombination, instead it capitalizes on the existence of large reference panels-comprising thousands of reference haplotypes-and assumes that the reference haplotypes can adequately represent the target haplotypes over short regions unaltered. We validate GeneImp based on data from ultralow coverage sequencing (0.5×), and compare its performance to the most recent version of BEAGLE that can perform this task. We show that GeneImp achieves imputation quality very close to that of BEAGLE, using one to two orders of magnitude less time, without an increase in memory complexity. Therefore, GeneImp is the first practical choice for whole-genome imputation to a dense reference panel when prephasing cannot be applied, for instance, in datasets produced via ultralow coverage sequencing. A related future application for GeneImp is whole-genome imputation based on the off-target reads from deep whole-exome sequencing.

摘要

我们针对给定从超低覆盖度测序计算出的基因型似然值作为输入的情况,解决向密集参考面板进行基因型填充的任务。在此情形下,数据存在高度缺失或不确定性,因此更适合采用概率表示。大多数现有的填充算法不太适用于这种情况,因为它们为了计算效率依赖于预分相,并且在没有明确的基因型调用时,预分相任务的计算成本会很高。我们描述了GeneImp,一个用于基因型填充的程序,它不需要预分相,并且对于全基因组填充在计算上是可行的。GeneImp没有明确地对重组进行建模,相反,它利用了包含数千个参考单倍型的大型参考面板的存在,并假设参考单倍型可以在未改变的短区域上充分代表目标单倍型。我们基于超低覆盖度测序(0.5×)的数据验证了GeneImp,并将其性能与能够执行此任务的最新版本的BEAGLE进行比较。我们表明,GeneImp实现的填充质量与BEAGLE非常接近,使用的时间少一到两个数量级,且内存复杂度没有增加。因此,当无法应用预分相时,例如在通过超低覆盖度测序产生的数据集中,GeneImp是向密集参考面板进行全基因组填充的首个实际选择。GeneImp未来的一个相关应用是基于深度全外显子测序的脱靶读数进行全基因组填充。

相似文献

引用本文的文献

5
Imputation of ancient human genomes.古代人类基因组的推断。
Nat Commun. 2023 Jun 20;14(1):3660. doi: 10.1038/s41467-023-39202-0.
6
An autoencoder-based deep learning method for genotype imputation.一种基于自动编码器的深度学习基因分型填充方法。
Front Artif Intell. 2022 Nov 3;5:1028978. doi: 10.3389/frai.2022.1028978. eCollection 2022.
8
A joint use of pooling and imputation for genotyping SNPs.联合使用池化和插补进行 SNP 基因分型。
BMC Bioinformatics. 2022 Oct 13;23(1):421. doi: 10.1186/s12859-022-04974-7.

本文引用的文献

6
Genotype Imputation with Millions of Reference Samples.使用数百万参考样本进行基因型填充
Am J Hum Genet. 2016 Jan 7;98(1):116-26. doi: 10.1016/j.ajhg.2015.11.020.
7
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
10
minimac2: faster genotype imputation.Minimac2:更快的基因型填充。
Bioinformatics. 2015 Mar 1;31(5):782-4. doi: 10.1093/bioinformatics/btu704. Epub 2014 Oct 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验