Crossett Andrew, Lee Ann B, Klei Lambertus, Devlin Bernie, Roeder Kathryn
West Chester University, Carnegie Mellon University, University of Pittsburgh School of Medicine, University of Pittsburgh School of Medicine and Carnegie Mellon University.
Ann Appl Stat. 2013 Jun 27;7(2):669-690. doi: 10.1214/12-AOAS598.
Recent technological advances coupled with large sample sets have uncovered many factors underlying the genetic basis of traits and the predisposition to complex disease, but much is left to discover. A common thread to most genetic investigations is familial relationships. Close relatives can be identified from family records, and more distant relatives can be inferred from large panels of genetic markers. Unfortunately these empirical estimates can be noisy, especially regarding distant relatives. We propose a new method for denoising genetically-inferred relationship matrices by exploiting the underlying structure due to hierarchical groupings of correlated individuals. The approach, which we call Treelet Covariance Smoothing, employs a multiscale decomposition of covariance matrices to improve estimates of pairwise relationships. On both simulated and real data, we show that smoothing leads to better estimates of the relatedness amongst distantly related individuals. We illustrate our method with a large genome-wide association study and estimate the "heritability" of body mass index quite accurately. Traditionally heritability, defined as the fraction of the total trait variance attributable to additive genetic effects, is estimated from samples of closely related individuals using random effects models. We show that by using smoothed relationship matrices we can estimate heritability using population-based samples. Finally, while our methods have been developed for refining genetic relationship matrices and improving estimates of heritability, they have much broader potential application in statistics. Most notably, for error-in-variables random effects models and settings that require regularization of matrices with block or hierarchical structure.
近期的技术进步与大规模样本集相结合,揭示了许多性状遗传基础和复杂疾病易感性背后的因素,但仍有许多有待发现。大多数基因研究的一个共同线索是家族关系。可以从家族记录中识别近亲,而更远的亲属可以从大量基因标记中推断出来。不幸的是,这些经验估计可能存在噪声,尤其是对于远亲而言。我们提出了一种新方法,通过利用相关个体分层分组所产生的潜在结构,对基因推断的关系矩阵进行去噪。我们将这种方法称为小波协方差平滑,它采用协方差矩阵的多尺度分解来改进成对关系的估计。在模拟数据和真实数据上,我们都表明平滑处理能更好地估计远亲个体之间的亲缘关系。我们通过一项大型全基因组关联研究来说明我们的方法,并相当准确地估计了体重指数的“遗传力”。传统上,遗传力定义为总性状变异中可归因于加性遗传效应的比例,是使用随机效应模型从近亲个体样本中估计出来的。我们表明,通过使用平滑后的关系矩阵,我们可以使用基于人群的样本估计遗传力。最后,虽然我们的方法是为了优化基因关系矩阵和改进遗传力估计而开发的,但它们在统计学中有更广泛的潜在应用。最值得注意的是,对于变量误差随机效应模型以及需要对具有块结构或分层结构的矩阵进行正则化的情况。