CINBIO, Department of Computer Science, ESEI-Escuela Superior de Ingeniería Informática, Universidade de Vigo, 32004 Ourense, Spain.
SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain.
J Integr Bioinform. 2024 Mar 27;21(2). doi: 10.1515/jib-2023-0046. eCollection 2024 Jun 1.
The vast amount of genome sequence data that is available, and that is predicted to drastically increase in the near future, can only be efficiently dealt with by building automated pipelines. Indeed, the Earth Biogenome Project will produce high-quality reference genome sequences for all 1.8 million named living eukaryote species, providing unprecedented insight into the evolution of genes and gene families, and thus on biological issues. Here, new modules for gene annotation, further BLAST search algorithms, further multiple sequence alignment methods, the adding of reference sequences, further tree rooting methods, the estimation of rates of synonymous and nonsynonymous substitutions, and the identification of positively selected amino acid sites, have been added to auto-phylo (version 2), a recently developed software to address biological problems using phylogenetic inferences. Additionally, we present auto-phylo-pipeliner, a graphical user interface application that further facilitates the creation and running of auto-phylo pipelines. Inferences on specificity, are critical for both cross-based breeding and for the establishment of pollination requirements. Therefore, as a test case, we develop an auto-phylo pipeline to identify amino acid sites under positive selection, that are, in principle, those determining specificity, starting from both non-annotated genomes and sequences available in public databases.
现有的基因组序列数据非常庞大,而且预计在不久的将来还会大幅增加,只有通过构建自动化管道才能有效地处理这些数据。事实上,地球生物基因组计划将为所有 180 万种已命名的真核生物物种生成高质量的参考基因组序列,这将为基因和基因家族的进化以及生物问题提供前所未有的深入了解。在这里,我们为 auto-phylo(版本 2)添加了新的基因注释模块、进一步的 BLAST 搜索算法、进一步的多重序列比对方法、参考序列的添加、进一步的树系根方法、同义替换和非同义替换率的估计以及阳性选择氨基酸位点的识别,这是一种最近开发的软件,用于通过系统发育推断解决生物学问题。此外,我们还展示了 auto-phylo-pipeliner,这是一个图形用户界面应用程序,进一步简化了 auto-phylo 管道的创建和运行。特异性推断对于基于交叉的繁殖和授粉要求的建立都至关重要。因此,作为一个测试案例,我们开发了一个 auto-phylo 管道,从非注释基因组和公共数据库中可用的序列开始,识别处于阳性选择下的氨基酸位点,这些位点原则上是决定特异性的氨基酸位点。