Fukunaga Tsukasa, Iwasaki Wataru
Waseda Institute for Advanced Study, Waseda University, Tokyo 1690051, Japan.
Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 1130032, Japan.
Bioinform Adv. 2021 Jul 30;1(1):vbab014. doi: 10.1093/bioadv/vbab014. eCollection 2021.
Reconstruction of gene copy number evolution is an essential approach for understanding how complex biological systems have been organized. Although various models have been proposed for gene copy number evolution, existing evolutionary models have not appropriately addressed the fact that different gene families can have very different gene gain/loss rates.
In this study, we developed Mirage (MIxtuRe model for Ancestral Genome Estimation), which allows different gene families to have flexible gene gain/loss rates. Mirage can use three models for formulating heterogeneous evolution among gene families: the discretized Γ model, probability distribution-free model and pattern mixture (PM) model. Simulation analysis showed that Mirage can accurately estimate heterogeneous gene gain/loss rates and reconstruct gene-content evolutionary history. Application to empirical datasets demonstrated that the PM model fits genome data from various taxonomic groups better than the other heterogeneous models. Using Mirage, we revealed that metabolic function-related gene families displayed frequent gene gains and losses in all taxa investigated.
The source code of Mirage is freely available at https://github.com/fukunagatsu/Mirage.
Supplementary data are available at online.
基因拷贝数进化的重建是理解复杂生物系统如何组织的重要方法。尽管已经提出了各种基因拷贝数进化模型,但现有的进化模型尚未充分考虑到不同基因家族可能具有非常不同的基因获得/丢失率这一事实。
在本研究中,我们开发了Mirage(用于祖先基因组估计的混合模型),它允许不同基因家族具有灵活的基因获得/丢失率。Mirage可以使用三种模型来制定基因家族间的异质进化:离散化Γ模型、无概率分布模型和模式混合(PM)模型。模拟分析表明,Mirage能够准确估计异质基因获得/丢失率并重建基因含量进化历史。对实证数据集的应用表明,PM模型比其他异质模型更适合来自各种分类群的基因组数据。使用Mirage,我们发现与代谢功能相关的基因家族在所有研究的分类群中都频繁发生基因获得和丢失。
Mirage的源代码可在https://github.com/fukunagatsu/Mirage上免费获取。
补充数据可在网上获取。