Ewens W J, Hodge S E, Ping F H
Am J Hum Genet. 1986 Apr;38(4):555-66.
We consider the question: In a segregation analysis, can knowledge of the family-size distribution (FSD) in the population from which a sample is drawn improve the estimators of genetic parameters? In other words, should one incorporate the population FSD into a segregation analysis if one knows it? If so, then under what circumstances? And how much improvement may result? We examine the variance and bias of the maximum likelihood estimators both asymptotically and in finite samples. We consider Poisson and geometric FSDs, as well as a simple two-valued FSD in which all families in the population have either one or two children. We limit our study to a simple genetic model with truncate selection. We find that if the FSD is completely specified, then the asymptotic variance of the estimator may be reduced by as much as 5%-10%, especially when the FSD is heavily skewed toward small families. Results in small samples are less clear-cut. For some of the simple two-valued FSDs, the variance of the estimator in small samples of one- and two-child families may actually be increased slightly when the FSD is included in the analysis. If one knows only the statistical form of the FSD, but not its parameter, then the estimator is improved only minutely. Our study also underlines the fact that results derived from asymptotic maximum likelihood theory do not necessarily hold in small samples. We conclude that in most practical applications it is not worth incorporating the FSD into a segregation analysis. However, this practice may be justified under special circumstances where the FSD is completely specified, without error, and the population consists overwhelmingly of small families.
在分离分析中,了解抽样所来自人群的家庭规模分布(FSD)能否改进遗传参数的估计值?换句话说,如果知道人群的FSD,是否应将其纳入分离分析?如果是,那么在何种情况下?以及可能带来多大程度的改进?我们考察了最大似然估计值的方差和偏差,包括渐近情形和有限样本情形。我们考虑了泊松分布和几何分布的FSD,以及一种简单的双值FSD,即人群中所有家庭要么有一个孩子,要么有两个孩子。我们将研究限制在一个具有截断选择的简单遗传模型上。我们发现,如果FSD被完全确定,那么估计值的渐近方差可能会降低多达5% - 10%,特别是当FSD严重偏向小家庭时。小样本情况下的结果不太明确。对于某些简单的双值FSD,当在分析中纳入FSD时,一孩和二孩家庭小样本中估计值的方差实际上可能会略有增加。如果只知道FSD的统计形式,而不知道其参数,那么估计值只会得到微小的改进。我们的研究还强调了一个事实,即从渐近最大似然理论得出的结果不一定适用于小样本。我们得出结论,在大多数实际应用中,将FSD纳入分离分析并不值得。然而,在特殊情况下,即FSD被完全无误地确定且人群绝大多数由小家庭组成时,这种做法可能是合理的。