Ko Amy, Nielsen Rasmus
Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America.
Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America.
PLoS Genet. 2017 Aug 21;13(8):e1006963. doi: 10.1371/journal.pgen.1006963. eCollection 2017 Aug.
Pedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit.
系谱包含个体间的谱系关系信息,在许多遗传学研究领域具有至关重要的意义。然而,系谱往往是未知的,必须从遗传数据中推断出来。尽管系谱推断很重要,但现有方法仅限于推断近亲关系或分析少数个体或基因座。我们提出了一种模拟退火方法,用于使用全基因组SNP数据在大量表面上无亲缘关系的个体样本中估计系谱。该方法支持复杂的系谱结构,如多配偶家庭、多代家庭以及许多成员个体缺失的系谱。通过使用近似完全似然的复合似然函数,计算速度大大提高。我们在模拟数据上验证了我们的方法,结果表明它比现有方法能更准确地推断远亲关系。此外,我们在格陵兰因纽特人的样本上说明了该方法的实用性。