Mutte Sumanth Kumar, Weijers Dolf
Laboratory of Biochemistry, Wageningen University, 6708WE, Wageningen, the Netherlands.
Bio Protoc. 2020 Mar 20;10(6):e3566. doi: 10.21769/BioProtoc.3566.
Phylogenetics is an important area of evolutionary biology that helps to understand the origin and divergence of genes, genomes and species. Building meaningful phylogenetic trees is needed for the accurate reconstruction of the past. To achieve a correct phylogenetic understanding of genes or proteins, reliable and robust methods are needed to construct meaningful trees. With the rapidly increasing availability of genome and transcriptome sequencing data, there is a need for efficient and accurate methodologies for ancestral state reconstruction. Currently available methods are mostly specific for certain gene families, and require substantial adaptation for their application to other gene families. Hence, a generalized framework is essential to utilize large transcriptome resources such as OneKP and MMETSP. Here, we have developed a flexible yet efficient method, based on core strengths such as emphasis on being inclusive in homolog selection, and defining orthologs based on multi-layered inferences. We illustrate how specific steps can be modified to fit the needs of any protein family under consideration. We also demonstrate the success of this protocol by studying and testing the orthologs in various gene families. Taken together, we present a protocol for reconstructing the ancestral states of various domains and proteins across multiple kingdoms of eukaryotes, using thousands of transcriptomes.
系统发育学是进化生物学的一个重要领域,有助于理解基因、基因组和物种的起源与分化。为了准确重建过去,需要构建有意义的系统发育树。为了对基因或蛋白质有正确的系统发育理解,需要可靠且稳健的方法来构建有意义的树。随着基因组和转录组测序数据的快速增加,需要高效且准确的方法来进行祖先状态重建。目前可用的方法大多针对特定基因家族,将其应用于其他基因家族时需要大量调整。因此,一个通用框架对于利用诸如OneKP和MMETSP等大型转录组资源至关重要。在此,我们基于诸如强调在同源物选择中具有包容性以及基于多层推断定义直系同源物等核心优势,开发了一种灵活而高效的方法。我们说明了如何修改特定步骤以满足所考虑的任何蛋白质家族的需求。我们还通过研究和测试各种基因家族中的直系同源物来证明该方案的成功。综上所述,我们提出了一种使用数千个转录组重建真核生物多个王国中各种结构域和蛋白质祖先状态的方案。