Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America.
Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America.
PLoS Biol. 2024 Oct 9;22(10):e3002847. doi: 10.1371/journal.pbio.3002847. eCollection 2024 Oct.
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these 2 fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we lay out a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., genome-wide association studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur analytically and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study, we re-examine an analysis testing for coevolution of expression levels between genes across a fungal phylogeny and show that including eigenvectors of the covariance matrix as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
在统计遗传学和系统发育学中,一个主要目标是确定遗传基因座或表型或环境的其他方面与焦点性状之间的相关性。在这两个领域中,存在着针对这些任务的复杂但不同的统计传统。随着医学、保护生物学和进化生物学中的问题越来越依赖于整合来自物种内部和物种之间的数据,以及曾经清晰的概念性划分变得越来越模糊,它们各自方法之间的脱节变得不可持续。为了帮助弥合这一鸿沟,我们提出了一个通用模型,描述了不同个体的数量性状遗传贡献之间的协方差。采用这种方法表明,统计遗传学中的标准模型(例如全基因组关联研究;GWAS)和系统发育比较生物学中的标准模型(例如系统发育回归)可以被解释为这个更通用的数量遗传模型的特例。这些模型具有相同的核心架构这一事实意味着,当我们检验关联时,我们可以对不同方法控制遗传结构的优缺点建立统一的理解。我们从理论上分析了为什么以及何时会出现虚假相关,并对数量性状进行了群体遗传学和系统发育学模拟。统计遗传学和系统发育学中问题的结构相似性使我们能够从一个领域采用方法进步并将其应用于另一个领域。我们通过展示如何在系统发育分析中减轻虚假相关性来证明这一点,包括标准的 GWAS 技术——包括遗传相关矩阵(GRM)以及回归模型中基因型矩阵的主要特征向量,对应于主成分——都可以减轻系统发育分析中的虚假相关性。作为一个案例研究,我们重新检验了一项测试跨真菌系统发育中基因表达水平共进化的分析,并表明包括协方差矩阵的特征向量作为协变量可以降低假阳性率,同时提高真阳性率。更一般地说,这项工作为理解表型的遗传结构以及进化过程如何塑造它提供了一个更具综合性的方法基础。