Selle Maria Lie, Steinsland Ingelin, Lindgren Finn, Brajkovic Vladimir, Cubric-Curik Vlatka, Gorjanc Gregor
Department of Mathematical Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.
School of Mathematics, University of Edinburgh, Edinburgh, United Kingdom.
Front Genet. 2021 Jan 15;11:531218. doi: 10.3389/fgene.2020.531218. eCollection 2020.
We introduce a hierarchical model to estimate haplotype effects based on phylogenetic relationships between haplotypes and their association with observed phenotypes. In a population there are many, but not all possible, distinct haplotypes and few observations per haplotype. Further, haplotype frequencies tend to vary substantially. Such data structure challenge estimation of haplotype effects. However, haplotypes often differ only due to few mutations, and leveraging similarities can improve the estimation of effects. We build on extensive literature and develop an autoregressive model of order one that models haplotype effects by leveraging phylogenetic relationships described with a directed acyclic graph. The phylogenetic relationships can be either in a form of a tree or a network, and we refer to the model as the haplotype network model. The model can be included as a component in a phenotype model to estimate associations between haplotypes and phenotypes. Our key contribution is that we obtain a sparse model, and by using hierarchical autoregression, the flow of information between similar haplotypes is estimated from the data. A simulation study shows that the hierarchical model can improve estimates of haplotype effects compared to an independent haplotype model, especially with few observations for a specific haplotype. We also compared it to a mutation model and observed comparable performance, though the haplotype model has the potential to capture background specific effects. We demonstrate the model with a study of mitochondrial haplotype effects on milk yield in cattle. We provide R code to fit the model with the INLA package.
我们引入一种分层模型,以基于单倍型之间的系统发育关系及其与观察到的表型的关联来估计单倍型效应。在一个种群中,存在许多但并非所有可能的不同单倍型,并且每个单倍型的观察值很少。此外,单倍型频率往往差异很大。这种数据结构对单倍型效应的估计提出了挑战。然而,单倍型通常仅因少数突变而不同,利用相似性可以改进效应估计。我们以大量文献为基础,开发了一个一阶自回归模型,该模型通过利用用有向无环图描述的系统发育关系来对单倍型效应进行建模。系统发育关系可以是树状或网络状形式,我们将该模型称为单倍型网络模型。该模型可以作为一个组件包含在表型模型中,以估计单倍型与表型之间的关联。我们的关键贡献在于我们获得了一个稀疏模型,并且通过使用分层自回归,从数据中估计相似单倍型之间的信息流。一项模拟研究表明,与独立单倍型模型相比,分层模型可以改进单倍型效应的估计,特别是对于特定单倍型观察值较少的情况。我们还将其与突变模型进行了比较,观察到了可比的性能,尽管单倍型模型有潜力捕捉背景特定效应。我们通过一项关于线粒体单倍型对奶牛产奶量影响的研究来展示该模型。我们提供了使用INLA软件包拟合该模型的R代码。