Zhou Yi-Hui, Marron James S, Wright Fred A
Department of Biological Sciences, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, U.S.A.
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, U.S.A.
Biometrics. 2018 Mar;74(1):155-164. doi: 10.1111/biom.12708. Epub 2017 Apr 27.
The issue of robustness to family relationships in computing genotype ancestry scores such as eigenvector projections has received increased attention in genetic association, and is particularly challenging when sets of both unrelated individuals and closely related family members are included. The current standard is to compute loadings (left singular vectors) using unrelated individuals and to compute projected scores for remaining family members. However, projected ancestry scores from this approach suffer from shrinkage toward zero. We consider two main novel strategies: (i) matrix substitution based on decomposition of a target family-orthogonalized covariance matrix, and (ii) using family-averaged data to obtain loadings. We illustrate the performance via simulations, including resampling from 1000 Genomes Project data, and analysis of a cystic fibrosis dataset. The matrix substitution approach has similar performance to the current standard, but is simple and uses only a genotype covariance matrix, while the family-average method shows superior performance. Our approaches are accompanied by novel ancillary approaches that provide considerable insight, including individual-specific eigenvalue scree plots.
在计算基因型祖先分数(如特征向量投影)时,对家族关系的稳健性问题在基因关联研究中受到了越来越多的关注,当纳入无关个体和密切相关的家庭成员时,这一问题尤其具有挑战性。当前的标准做法是使用无关个体计算载荷(左奇异向量),并为其余家庭成员计算投影分数。然而,这种方法得到的投影祖先分数会出现向零收缩的情况。我们考虑了两种主要的新策略:(i)基于目标家族正交化协方差矩阵分解的矩阵替换,以及(ii)使用家族平均数据来获得载荷。我们通过模拟来说明性能,包括从千人基因组计划数据中重采样,以及对囊性纤维化数据集的分析。矩阵替换方法的性能与当前标准相似,但简单且仅使用基因型协方差矩阵,而家族平均方法表现出更优的性能。我们的方法还伴随着一些新颖的辅助方法,这些方法提供了相当多的见解,包括个体特异性特征值碎石图。