Key Laboratory of Horticultural Plant Biology of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac276.
Protein phylogenetic analysis focuses on the evolutionary relationships among related protein sequences and can help researchers infer protein functions and developmental trajectories. With the advent of the big data era, the existing protein phylogenetic methods, including distance matrix and character-based methods, are facing challenges in both running time and application scope. Here, we developed an R package that we call CProtMEDIAS that is useful for protein phylogenetic analysis. In contrast to existing phylogenetic analysis methods, CProtMEDIAS utilizes dimensionality reduction algorithms to digitize multiple sequence alignments and quickly conduct phylogenetic analysis with a large number of amino acid sequences from similarly distant protein families and species. We used CProtMEDIAS to perform a dimensionality reduction, clustering, pseudotime, specific residue and evolutionary trajectory analysis of the plant homeobox superfamily. We found that CProtMEDIAS delivers consistent clustering, fast running and elegant presentation and thus provides powerful new tools and methods for protein clustering and evolutionary analysis.
蛋白质系统发生分析主要关注相关蛋白质序列之间的进化关系,可以帮助研究人员推断蛋白质的功能和发育轨迹。随着大数据时代的到来,现有的蛋白质系统发生方法,包括距离矩阵和基于特征的方法,在运行时间和应用范围方面都面临着挑战。在这里,我们开发了一个名为 CProtMEDIAS 的 R 包,它在蛋白质系统发生分析中非常有用。与现有的系统发生分析方法不同,CProtMEDIAS 利用降维算法对多序列比对进行数字化,并利用来自相似远缘蛋白质家族和物种的大量氨基酸序列快速进行系统发生分析。我们使用 CProtMEDIAS 对植物同源盒超家族进行了降维、聚类、伪时间、特定残基和进化轨迹分析。我们发现 CProtMEDIAS 提供了一致的聚类、快速的运行和优雅的呈现,因此为蛋白质聚类和进化分析提供了强大的新工具和方法。