Milligan Brook G
Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, Scotland.
Genetics. 2003 Mar;163(3):1153-67. doi: 10.1093/genetics/163.3.1153.
Relatedness between individuals is central to many studies in genetics and population biology. A variety of estimators have been developed to enable molecular marker data to quantify relatedness. Despite this, no effort has been given to characterize the traditional maximum-likelihood estimator in relation to the remainder. This article quantifies its statistical performance under a range of biologically relevant sampling conditions. Under the same range of conditions, the statistical performance of five other commonly used estimators of relatedness is quantified. Comparison among these estimators indicates that the traditional maximum-likelihood estimator exhibits a lower standard error under essentially all conditions. Only for very large amounts of genetic information do most of the other estimators approach the likelihood estimator. However, the likelihood estimator is more biased than any of the others, especially when the amount of genetic information is low or the actual relationship being estimated is near the boundary of the parameter space. Even under these conditions, the amount of bias can be greatly reduced, potentially to biologically irrelevant levels, with suitable genetic sampling. Additionally, the likelihood estimator generally exhibits the lowest root mean-square error, an indication that the bias in fact is quite small. Alternative estimators restricted to yield only biologically interpretable estimates exhibit lower standard errors and greater bias than do unrestricted ones, but generally do not improve over the maximum-likelihood estimator and in some cases exhibit even greater bias. Although some nonlikelihood estimators exhibit better performance with respect to specific metrics under some conditions, none approach the high level of performance exhibited by the likelihood estimator across all conditions and all metrics of performance.
个体之间的亲缘关系是遗传学和群体生物学中许多研究的核心。人们已经开发出各种估计方法,以便利用分子标记数据来量化亲缘关系。尽管如此,尚未有人对传统的最大似然估计器与其他估计器进行比较。本文在一系列生物学相关的抽样条件下,对其统计性能进行了量化。在相同的条件范围内,对其他五种常用的亲缘关系估计器的统计性能也进行了量化。这些估计器之间的比较表明,传统的最大似然估计器在几乎所有条件下都具有较低的标准误差。只有在遗传信息非常多的情况下,其他大多数估计器才接近似然估计器。然而,似然估计器比其他任何估计器都更有偏差,尤其是当遗传信息量较低或所估计的实际关系接近参数空间边界时。即使在这些条件下,通过适当的遗传抽样,偏差量也可以大大减少,甚至可能降至生物学上无关紧要的水平。此外,似然估计器通常表现出最低的均方根误差,这表明偏差实际上相当小。限制为仅产生生物学上可解释估计值的替代估计器,与无限制的估计器相比,具有更低的标准误差和更大的偏差,但总体上并不比最大似然估计器有所改进,在某些情况下甚至表现出更大的偏差。尽管一些非似然估计器在某些条件下针对特定指标表现出更好的性能,但在所有条件和所有性能指标下,没有一个能达到似然估计器所展现的高水平性能。