Gusev Alexander, Lowe Jennifer K, Stoffel Markus, Daly Mark J, Altshuler David, Breslow Jan L, Friedman Jeffrey M, Pe'er Itsik
Department of Computer Science, Columbia University, New York, New York 10027, USA.
Genome Res. 2009 Feb;19(2):318-26. doi: 10.1101/gr.081398.108. Epub 2008 Oct 29.
We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on a dictionary of haplotypes that is used to efficiently discover short exact matches between individuals. We then expand these matches using dynamic programming to identify long, nearly identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We bolster these results by demonstrating novel applications of precise analysis of hidden relatedness for (1) identification and resolution of phasing errors and (2) exposing polymorphic deletions that are otherwise challenging to detect. This finding is supported by concordance of detected deletions with other evidence from independent databases and statistical analyses of fluorescence intensity not used by GERMLINE.
我们介绍了GERMLINE,这是一种强大的算法,用于识别个体对之间近期共同祖先所指示的片段共享。与具有类似目标的方法不同,GERMLINE随样本数量呈线性扩展,能够对大型队列中的全基因组数据进行分析。我们的方法基于一个单倍型字典,该字典用于有效地发现个体之间的短精确匹配。然后,我们使用动态规划扩展这些匹配,以识别指示亲缘关系的长的、几乎相同的片段共享。我们使用GERMLINE全面调查了HapMap以及一个由3000个个体组成的高密度分型岛屿群体中的隐藏亲缘关系。我们验证了GERMLINE在其他方法能够处理数据时与它们一致,并且还便于进行更大规模研究的分析。我们通过展示对隐藏亲缘关系进行精确分析的新应用来支持这些结果,这些应用包括(1)识别和解决定相错误,以及(2)揭示其他情况下难以检测到的多态性缺失。检测到的缺失与来自独立数据库的其他证据以及GERMLINE未使用的荧光强度统计分析的一致性支持了这一发现。
Genome Res. 2009-2
PLoS One. 2012-11-19
Mol Biol Evol. 2011-10-6
Bioinformatics. 2004-8-12
Bioinformatics. 2005-12-15
Animals (Basel). 2025-3-29
NAR Genom Bioinform. 2025-4-4
Proc Natl Acad Sci U S A. 2025-1-21
Mol Biol Evol. 2024-12-6
Am J Hum Genet. 2008-3
Theor Popul Biol. 2008-5
N Engl J Med. 2008-2-14
Hum Mol Genet. 2008-2-15
J Zhejiang Univ Sci B. 2007-11
Nature. 2007-10-18