Gianola Daniel, Fariello Maria I, Naya Hugo, Schön Chris-Carolin
Department of Animal Sciences, University of Wisconsin-Madison, Wisconsin 53706 Department of Dairy Science, University of Wisconsin-Madison, Wisconsin 53706 Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Wisconsin 53706 Technical University of Munich School of Life Sciences Weihenstephan, Technical University of Munich, D-85354 Freising, Germany Institute for Advanced Study, Technical University of Munich, D-85748 Garching, Germany Bioinformatics Unit, Institut Pasteur de Montevideo, 11400, Uruguay
Bioinformatics Unit, Institut Pasteur de Montevideo, 11400, Uruguay Instituto de Matemática y Estadística Rafael Laguardia, Facultad de Ingeniería, Universidad de la República, 11300 Montevideo, Uruguay.
G3 (Bethesda). 2016 Oct 13;6(10):3241-3256. doi: 10.1534/g3.116.034256.
Standard genome-wide association studies (GWAS) scan for relationships between each of p molecular markers and a continuously distributed target trait. Typically, a marker-based matrix of genomic similarities among individuals ( G: ) is constructed, to account more properly for the covariance structure in the linear regression model used. We show that the generalized least-squares estimator of the regression of phenotype on one or on m markers is invariant with respect to whether or not the marker(s) tested is(are) used for building G,: provided variance components are unaffected by exclusion of such marker(s) from G: The result is arrived at by using a matrix expression such that one can find many inverses of genomic relationship, or of phenotypic covariance matrices, stemming from removing markers tested as fixed, but carrying out a single inversion. When eigenvectors of the genomic relationship matrix are used as regressors with fixed regression coefficients, e.g., to account for population stratification, their removal from G: does matter. Removal of eigenvectors from G: can have a noticeable effect on estimates of genomic and residual variances, so caution is needed. Concepts were illustrated using genomic data on 599 wheat inbred lines, with grain yield as target trait, and on close to 200 Arabidopsis thaliana accessions.
标准的全基因组关联研究(GWAS)扫描p个分子标记中的每一个与连续分布的目标性状之间的关系。通常,构建个体间基于标记的基因组相似性矩阵(G:),以便更恰当地考虑所用线性回归模型中的协方差结构。我们表明,表型对一个或m个标记的回归的广义最小二乘估计量对于所测试的标记是否用于构建G:是不变的:前提是方差分量不受从G:中排除此类标记的影响。该结果是通过使用矩阵表达式得出的,这样一来,人们可以找到许多源于将测试标记视为固定但仅进行一次求逆的基因组关系或表型协方差矩阵的逆矩阵。当基因组关系矩阵的特征向量用作具有固定回归系数的回归变量时,例如为了考虑群体分层,将它们从G:中移除确实会产生影响。从G:中移除特征向量可能会对基因组方差和残差方差的估计产生显著影响,因此需要谨慎。使用了关于599个小麦自交系(以籽粒产量为目标性状)和近200个拟南芥种质的基因组数据来说明这些概念。