CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal.
PLoS One. 2011;6(10):e26638. doi: 10.1371/journal.pone.0026638. Epub 2011 Oct 26.
The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate.
ITS2 基因类在其成员之间表现出很高的序列差异,这使得其注释和用于重建更高分类水平(超越种和属)系统发育的工作变得复杂。已经实施了几种对齐策略来提高 ITS2 注释的质量,并将其用于系统发育推断。虽然基于对齐的方法已经被充分利用到了其复杂性的顶端,以解决这两个问题,但还没有无对齐的方法能够成功地解决这两个问题。相比之下,使用简单的无对齐分类器,如包含 ITS2 序列和结构信息的拓扑指数(TIs),可能是预测基因和评估真核生物 ITS2 类系统发育关系的有用方法。因此,我们使用了 TI2BioP(拓扑指数到生物聚合物)方法学[1],[2],可在 http://ti2biop.sourceforge.net/ 免费获得,以计算两个不同的 TIs。一类是从 DNA 字符串生成的 ITS2 人工 2D 结构中得出的,另一类是从 RNA 折叠算法推断出的二级结构中得出的。基于上述两类 TIs,我们开发了两种基于人工神经网络的无对齐模型,用于 ITS2 类的预测。两个模型在训练集和测试集上的表现都相似,总体分类准确率均超过 95%。由于 ITS2 区域对真菌鉴定的重要性,我们从 Petrakia sp. 中分离出了一段新的 ITS2 基因组序列。该序列和测试集被用于比较评估基于多重序列比对的传统分类模型,如基于隐马尔可夫的方法,结果表明我们的模型能够成功识别新的 ITS2 成员。我们使用传统的和基于无对齐的技术对分离出的序列进行了评估,应用于系统发育推断,以补充 Petrakia sp. 真菌分离物的分类学。