Suppr超能文献

对数千份基因分型样本进行分相。

Phasing of many thousands of genotyped samples.

机构信息

Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

出版信息

Am J Hum Genet. 2012 Aug 10;91(2):238-51. doi: 10.1016/j.ajhg.2012.06.013.

Abstract

Haplotypes are an important resource for a large number of applications in human genetics, but computationally inferred haplotypes are subject to switch errors that decrease their utility. The accuracy of computationally inferred haplotypes increases with sample size, and although ever larger genotypic data sets are being generated, the fact that existing methods require substantial computational resources limits their applicability to data sets containing tens or hundreds of thousands of samples. Here, we present HAPI-UR (haplotype inference for unrelated samples), an algorithm that is designed to handle unrelated and/or trio and duo family data, that has accuracy comparable to or greater than existing methods, and that is computationally efficient and can be applied to 100,000 samples or more. We use HAPI-UR to phase a data set with 58,207 samples and show that it achieves practical runtime and that switch errors decrease with sample size even with the use of samples from multiple ethnicities. Using a data set with 16,353 samples, we compare HAPI-UR to Beagle, MaCH, IMPUTE2, and SHAPEIT and show that HAPI-UR runs 18× faster than all methods and has a lower switch-error rate than do other methods except for Beagle; with the use of consensus phasing, running HAPI-UR three times gives a slightly lower switch-error rate than Beagle does and is more than six times faster. We demonstrate results similar to those from Beagle on another data set with a higher marker density. Lastly, we show that HAPI-UR has better runtime scaling properties than does Beagle so that for larger data sets, HAPI-UR will be practical and will have an even larger runtime advantage. HAPI-UR is available online (see Web Resources).

摘要

单体型是人类遗传学中许多应用的重要资源,但计算推断的单体型容易发生转换错误,从而降低其使用价值。计算推断的单体型的准确性随着样本量的增加而提高,尽管越来越大的基因型数据集正在生成,但现有的方法需要大量的计算资源,这限制了它们在包含数十万或数十万样本的数据集中的适用性。在这里,我们提出了 HAPI-UR(无关样本单体型推断),这是一种专为处理无关和/或三亲和二联体家族数据而设计的算法,它具有与现有方法相当或更高的准确性,并且计算效率高,可以应用于 10 万个或更多的样本。我们使用 HAPI-UR 对一个包含 58207 个样本的数据集进行了相位分析,结果表明它具有实际的运行时间,并且即使使用来自多个种族的样本,转换错误也会随着样本数量的增加而减少。使用一个包含 16353 个样本的数据集,我们将 HAPI-UR 与 Beagle、MaCH、 IMPUTE2 和 SHAPEIT 进行了比较,结果表明 HAPI-UR 的运行速度比所有方法都快 18 倍,转换错误率比除 Beagle 之外的其他方法都低;使用共识相位,运行 HAPI-UR 三次的转换错误率略低于 Beagle,速度是其的六倍以上。我们在另一个标记密度更高的数据集上展示了与 Beagle 类似的结果。最后,我们表明 HAPI-UR 具有比 Beagle 更好的运行时扩展特性,因此对于更大的数据集,HAPI-UR 将是实用的,并且将具有更大的运行时优势。HAPI-UR 可在线获得(见网络资源)。

相似文献

1
Phasing of many thousands of genotyped samples.对数千份基因分型样本进行分相。
Am J Hum Genet. 2012 Aug 10;91(2):238-51. doi: 10.1016/j.ajhg.2012.06.013.
6
Fast two-stage phasing of large-scale sequence data.大规模序列数据的快速两阶段相位测定。
Am J Hum Genet. 2021 Oct 7;108(10):1880-1890. doi: 10.1016/j.ajhg.2021.08.005. Epub 2021 Sep 2.
8
Rapid haplotype inference for nuclear families.快速核型推断的家族。
Genome Biol. 2010;11(10):R108. doi: 10.1186/gb-2010-11-10-r108. Epub 2010 Oct 29.

引用本文的文献

5
Accurate genome-wide phasing from IBD data.基于 IBD 数据的精确全基因组相位推断。
BMC Bioinformatics. 2022 Nov 23;23(1):502. doi: 10.1186/s12859-022-05066-2.
6
A comparative analysis of current phasing and imputation software.当前相位分析和插补软件的比较分析。
PLoS One. 2022 Oct 19;17(10):e0260177. doi: 10.1371/journal.pone.0260177. eCollection 2022.
10
Fast two-stage phasing of large-scale sequence data.大规模序列数据的快速两阶段相位测定。
Am J Hum Genet. 2021 Oct 7;108(10):1880-1890. doi: 10.1016/j.ajhg.2021.08.005. Epub 2021 Sep 2.

本文引用的文献

1
Genotype imputation with thousands of genomes.使用数千份基因组进行基因型推断。
G3 (Bethesda). 2011 Nov;1(6):457-70. doi: 10.1534/g3.111.001198. Epub 2011 Nov 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验