da Silva Filho Antonio Camilo, Raittz Roberto Tadeu, Guizelini Dieval, De Pierri Camilla Reginatto, Augusto Diônata Willian, Dos Santos-Weiss Izabella Castilhos Ribeiro, Marchaukoski Jeroniza Nunes
Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil.
Department of Biochemistry and Molecular Biology, Federal University of Parana, Curitiba, Brazil.
Front Genet. 2018 Dec 12;9:619. doi: 10.3389/fgene.2018.00619. eCollection 2018.
Tools for genomic island prediction use strategies for genomic comparison analysis and sequence composition analysis. The goal of comparative analysis is to identify unique regions in the genomes of related organisms, whereas sequence composition analysis evaluates and relates the composition of specific regions with other regions in the genome. The goal of this study was to qualitatively and quantitatively evaluate extant genomic island predictors. We chose tools reported to produce significant results using sequence composition prediction, comparative genomics, and hybrid genomics methods. To maintain diversity, the tools were applied to eight complete genomes of organisms with distinct characteristics and belonging to different families. CFT073 was used as a control and considered as the gold standard because its islands were previously curated . The results of predictions with the gold standard were manually curated, and the content and characteristics of each predicted island were analyzed. For other organisms, we created GenBank (GBK) files using Artemis software for each predicted island. We copied only the amino acid sequences from the coding sequence and constructed a multi-FASTA file for each predictor. We used BLASTp to compare all results and generate hits to evaluate similarities and differences among the predictions. Comparison of the results with the gold standard revealed that GIPSy produced the best results, covering ~91% of the composition and regions of the islands, followed by Alien Hunter (81%), IslandViewer (47.8%), Predict Bias (31%), GI Hunter (17%), and Zisland Explorer (16%). The tools with the best results in the analyzes of the set of organisms were the same ones that presented better performance in the tests with the gold standard.
基因组岛预测工具采用基因组比较分析和序列组成分析策略。比较分析的目的是识别相关生物基因组中的独特区域,而序列组成分析则评估特定区域的组成并将其与基因组中的其他区域进行关联。本研究的目的是对现有的基因组岛预测器进行定性和定量评估。我们选择了据报道使用序列组成预测、比较基因组学和混合基因组学方法产生显著结果的工具。为保持多样性,将这些工具应用于八个具有不同特征且属于不同科的生物的完整基因组。CFT073用作对照并被视为金标准,因为其岛屿先前已被整理。用金标准进行预测的结果经过人工整理,并分析了每个预测岛的内容和特征。对于其他生物,我们使用Artemis软件为每个预测岛创建GenBank(GBK)文件。我们仅从编码序列中复制氨基酸序列,并为每个预测器构建一个多FASTA文件。我们使用BLASTp比较所有结果并生成比对以评估预测之间的异同。将结果与金标准进行比较发现,GIPSy产生的结果最佳,覆盖了岛屿组成和区域的约91%,其次是Alien Hunter(81%)、IslandViewer(47.8%)、Predict Bias(31%)、GI Hunter(17%)和Zisland Explorer(16%)。在一组生物分析中结果最佳的工具与在金标准测试中表现更好的工具相同。