Meyer Karin, Kirkpatrick Mark
University of New England, Armidale NSW 2351, Australia.
Genetics. 2008 Oct;180(2):1153-66. doi: 10.1534/genetics.108.090159. Epub 2008 Aug 30.
Eigenvalues and eigenvectors of covariance matrices are important statistics for multivariate problems in many applications, including quantitative genetics. Estimates of these quantities are subject to different types of bias. This article reviews and extends the existing theory on these biases, considering a balanced one-way classification and restricted maximum-likelihood estimation. Biases are due to the spread of sample roots and arise from ignoring selected principal components when imposing constraints on the parameter space, to ensure positive semidefinite estimates or to estimate covariance matrices of chosen, reduced rank. In addition, it is shown that reduced-rank estimators that consider only the leading eigenvalues and -vectors of the "between-group" covariance matrix may be biased due to selecting the wrong subset of principal components. In a genetic context, with groups representing families, this bias is inverse proportional to the degree of genetic relationship among family members, but is independent of sample size. Theoretical results are supplemented by a simulation study, demonstrating close agreement between predicted and observed bias for large samples. It is emphasized that the rank of the genetic covariance matrix should be chosen sufficiently large to accommodate all important genetic principal components, even though, paradoxically, this may require including a number of components with negligible eigenvalues. A strategy for rank selection in practical analyses is outlined.
协方差矩阵的特征值和特征向量在包括数量遗传学在内的许多应用中的多变量问题里都是重要的统计量。这些量的估计会受到不同类型偏差的影响。本文回顾并扩展了关于这些偏差的现有理论,考虑了平衡的单向分类和限制最大似然估计。偏差是由于样本根的散布导致的,并且当对参数空间施加约束以确保半正定估计或估计选定的降秩协方差矩阵时,因忽略了选定的主成分而产生。此外,研究表明,仅考虑“组间”协方差矩阵的主导特征值和特征向量的降秩估计量可能会因选择了错误的主成分子集而产生偏差。在遗传学背景下,若组代表家系,这种偏差与家庭成员间的遗传关系程度成反比,但与样本大小无关。理论结果通过模拟研究得到补充,该研究表明大样本的预测偏差和观察偏差之间高度一致。需要强调的是,遗传协方差矩阵的秩应选择得足够大,以容纳所有重要的遗传主成分,尽管自相矛盾的是,这可能需要纳入一些特征值可忽略不计的成分。本文概述了实际分析中秩选择的策略。