Institute of Zoology, Zoological Society of London, London NW1 4RY, United Kingdom.
Genetics. 2012 May;191(1):183-94. doi: 10.1534/genetics.111.138149. Epub 2012 Feb 23.
Quite a few methods have been proposed to infer sibship and parentage among individuals from their multilocus marker genotypes. They are all based on Mendelian laws either qualitatively (exclusion methods) or quantitatively (likelihood methods), have different optimization criteria, and use different algorithms in searching for the optimal solution. The full-likelihood method assigns sibship and parentage relationships among all sampled individuals jointly. It is by far the most accurate method, but is computationally prohibitive for large data sets with many individuals and many loci. In this article I propose a new likelihood-based method that is computationally efficient enough to handle large data sets. The method uses the sum of the log likelihoods of pairwise relationships in a configuration as the score to measure its plausibility, where log likelihoods of pairwise relationships are calculated only once and stored for repeated use. By analyzing several empirical and many simulated data sets, I show that the new method is more accurate than pairwise likelihood and exclusion-based methods, but is slightly less accurate than the full-likelihood method. However, the new method is computationally much more efficient than the full-likelihood method, and for the cases of both sexes polygamous and markers with genotyping errors, it can be several orders faster. The new method can handle a large sample with thousands of individuals and the number of markers limited only by the computer memory.
有相当多的方法被提出用于从个体的多位点标记基因型推断亲缘关系。它们都是基于孟德尔定律,要么是定性的(排除法),要么是定量的(似然法),具有不同的优化标准,并在搜索最优解时使用不同的算法。全似然法联合分配所有采样个体之间的亲缘关系。这是迄今为止最准确的方法,但对于具有许多个体和许多标记的大型数据集来说,计算上是不可行的。在本文中,我提出了一种新的基于似然的方法,它在计算上足够高效,可以处理大型数据集。该方法使用配置中成对关系的对数似然之和作为评分来衡量其合理性,其中仅计算一次并存储用于重复使用的对数似然。通过分析几个经验数据和许多模拟数据集,我表明该新方法比成对似然法和基于排除的方法更准确,但比全似然法略低。然而,新方法在计算上比全似然法效率高得多,并且对于两性多配偶制和存在标记基因型错误的情况,它可以快几个数量级。该新方法可以处理具有数千个个体和仅受计算机内存限制的标记数量的大型样本。