Santiago Caio Rafael do Nascimento, Assis Renata de Almeida Barbosa, Moreira Leandro Marcio, Digiampietri Luciano Antonio
Bioinformatics Graduate Program, University of Sao Paulo, Sao Paulo, Brazil.
Adventist University of Sao Paulo, Sao Paulo, Brazil.
Front Genet. 2019 Aug 26;10:725. doi: 10.3389/fgene.2019.00725. eCollection 2019.
Genomics research has produced an exponential amount of data. However, the genetic knowledge pertaining to certain phenotypic characteristics is lacking. Also, a considerable part of these genomes have coding sequences (CDSs) with unknown functions, posing additional challenges to researchers. Phylogenetically close microorganisms share much of their CDSs, and certain phenotypes unique to a set of microorganisms may be the result of the genes found exclusively in those microorganisms. This study presents the GTACG framework, an easy-to-use tool for identifying in the subgroups of bacterial genomes whose microorganisms have common phenotypic characteristics, to find data that differentiates them from other associated genomes in a simple and fast way. The GTACG analysis is based on the formation of homologous CDS clusters from local alignments. The front-end is easy to use, and the installation packages have been developed to enable users lacking knowledge of programming languages or bioinformatics analyze high-throughput data using the tool. The validation of the GTACG framework has been carried out based on a case report involving a set of 161 genomes from the Xanthomonadaceae family, in which 19 families of orthologous proteins were found in 90% of the plant-associated genomes, allowing the identification of the proteins potentially associated with adaptation and virulence in plant tissue. The results show the potential use of GTACG in the search for new targets for molecular studies, and GTACG can be used as a research tool by biologists who lack advanced knowledge in the use of computational tools for bacterial comparative genomics.
基因组学研究产生了指数级增长的数据量。然而,关于某些表型特征的遗传知识却很匮乏。此外,这些基因组中有相当一部分编码序列(CDS)的功能未知,给研究人员带来了额外的挑战。系统发育关系相近的微生物共享许多CDS,而一组微生物特有的某些表型可能是仅在这些微生物中发现的基因所致。本研究提出了GTACG框架,这是一种易于使用的工具,用于在具有共同表型特征的微生物的细菌基因组亚组中进行识别,以简单快速的方式找到将它们与其他相关基因组区分开来的数据。GTACG分析基于通过局部比对形成同源CDS簇。其前端易于使用,并且已经开发了安装包,使缺乏编程语言或生物信息学知识的用户能够使用该工具分析高通量数据。GTACG框架的验证是基于一个案例报告进行的,该报告涉及来自黄单胞菌科的161个基因组,其中在90%的植物相关基因组中发现了19个直系同源蛋白家族,从而能够鉴定出可能与植物组织中的适应性和毒力相关的蛋白质。结果表明GTACG在寻找分子研究新靶点方面具有潜在用途,并且GTACG可被缺乏细菌比较基因组学计算工具使用方面先进知识的生物学家用作研究工具。