Dylus David, Nevers Yannis, Altenhoff Adrian M, Gürtler Antoine, Dessimoz Christophe, Glover Natasha M
Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland.
F1000Res. 2020 Jun 4;9:511. doi: 10.12688/f1000research.23790.2. eCollection 2020.
Knowledge of species phylogeny is critical to many fields of biology. In an era of genome data availability, the most common way to make a phylogenetic species tree is by using multiple protein-coding genes, conserved in multiple species. This methodology is composed of several steps: orthology inference, multiple sequence alignment and inference of the phylogeny with dedicated tools. This can be a difficult task, and orthology inference, in particular, is usually computationally intensive and error prone if done . This tutorial provides protocols to make use of OMA Orthologous Groups, a set of genes all orthologous to each other, to infer a phylogenetic species tree. It is designed to be user-friendly and computationally inexpensive, by providing two options: (1) Using only precomputed groups with species available on the OMA Browser, or (2) Computing orthologs using OMA Standalone for additional species, with the option of using precomputed orthology relations for those present in OMA. A protocol for downstream analyses is provided as well, including creating a supermatrix, tree inference, and visualization. All protocols use publicly available software, and we provide scripts and code snippets to facilitate data handling. The protocols are accompanied with practical examples.
物种系统发育知识对生物学的许多领域至关重要。在基因组数据可得的时代,构建系统发育物种树最常用的方法是使用多个在多个物种中保守的蛋白质编码基因。该方法由几个步骤组成:直系同源推断、多序列比对以及使用专用工具推断系统发育。这可能是一项艰巨的任务,特别是直系同源推断,如果操作不当,通常计算量很大且容易出错。本教程提供了利用OMA直系同源组(一组彼此直系同源的基因)来推断系统发育物种树的方案。它旨在通过提供两种选择来实现用户友好且计算成本低廉:(1)仅使用OMA浏览器上可用物种的预计算组,或(2)使用OMA独立版为其他物种计算直系同源物,并可选择对OMA中已有的物种使用预计算的直系同源关系。还提供了下游分析的方案,包括创建超级矩阵、树推断和可视化。所有方案都使用公开可用的软件,并且我们提供脚本和代码片段以方便数据处理。这些方案还配有实际示例。