Computational Genomics, IBM TJ Watson Research, Yorktown Heights, NY 10598, USA.
Bioinformatics. 2013 Jul 1;29(13):i162-70. doi: 10.1093/bioinformatics/btt237.
Detecting IBD tracts is an important problem in genetics. Most of the existing methods focus on detecting pairwise IBD tracts, which have relatively low power to detect short IBD tracts. Methods to detect IBD tracts among multiple individuals simultaneously, or group-wise IBD tracts, have better performance for short IBD tracts detection. Group-wise IBD tracts can be applied to a wide range of applications, such as disease mapping, pedigree reconstruction and so forth. The existing group-wise IBD tract detection method is computationally inefficient and is only able to handle small datasets, such as 20, 30 individuals with hundreds of SNPs. It also requires a previous specification of the number of IBD groups, or partitions of the individuals where all the individuals in the same partition are IBD with each other, which may not be realistic in many cases. The method can only handle a small number of IBD groups, such as two or three, because of scalability issues. What is more, it does not take LD (linkage disequilibrium) into consideration.
In this work, we developed an efficient method IBD-Groupon, which detects group-wise IBD tracts based on pairwise IBD relationships, and it is able to address all the drawbacks aforementioned. To our knowledge, our method is the first practical group-wise IBD tracts detection method that is scalable to very large datasets, for example, hundreds of individuals with thousands of SNPs, and in the meanwhile, it is powerful to detect short IBD tracts. Our method does not need to specify the number of IBD groups, which will be detected automatically. And our method takes LD into consideration, as it is based on pairwise IBD tracts where LD can be easily incorporated.
检测 IBD 区域是遗传学中的一个重要问题。现有的大多数方法都侧重于检测成对的 IBD 区域,而这些方法对检测短 IBD 区域的能力相对较低。同时检测多个个体之间的 IBD 区域或组间 IBD 区域的方法,在检测短 IBD 区域方面具有更好的性能。组间 IBD 区域可应用于广泛的应用领域,例如疾病映射、谱系重建等。现有的组间 IBD 区域检测方法计算效率不高,只能处理小数据集,例如 20、30 个个体,有数百个 SNPs。它还需要预先指定 IBD 组的数量,或者个体的分区,其中同一分区中的所有个体彼此之间都是 IBD,这在许多情况下可能不现实。由于可扩展性问题,该方法只能处理少数几个 IBD 组,例如两个或三个。此外,它没有考虑 LD(连锁不平衡)。
在这项工作中,我们开发了一种有效的方法 IBD-Groupon,它基于成对的 IBD 关系来检测组间 IBD 区域,并且能够解决上述所有缺点。据我们所知,我们的方法是第一个实用的可扩展到非常大数据集的组间 IBD 区域检测方法,例如数百个个体,数千个 SNPs,同时具有强大的检测短 IBD 区域的能力。我们的方法不需要指定 IBD 组的数量,它将自动检测。并且我们的方法考虑了 LD,因为它基于可以轻松合并 LD 的成对 IBD 区域。