Hatje Klas, Keller Oliver, Hammesfahr Björn, Pillmann Holger, Waack Stephan, Kollmar Martin
Abteilung NMR basierte Strukturbiologie, Max-Planck-Institut für Biophysikalische Chemie, Am Fassberg 11, D-37077 Göttingen, Germany.
BMC Res Notes. 2011 Jul 28;4:265. doi: 10.1186/1756-0500-4-265.
Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain gene from different species. Due to the ever-increasing speed of genome sequencing, the gap to genome annotation is growing. Thus, tools for the correct prediction and reconstruction of genes in related organisms become more and more important. The tool Scipio, which can also be used via the graphical interface WebScipio, performs significant hit processing of the output of the Blat program to account for sequencing errors, missing sequence, and fragmented genome assemblies. However, Scipio has so far been limited to high sequence similarity and unable to reconstruct short exons.
Scipio and WebScipio have fundamentally been extended to better reconstruct very short exons and intron splice sites and to be better suited for cross-species gene structure predictions. The Needleman-Wunsch algorithm has been implemented for the search for short parts of the query sequence that were not recognized by Blat. Those regions might either be short exons, divergent sequence at intron splice sites, or very divergent exons. We have shown the benefit and use of new parameters with several protein examples from completely different protein families in searches against species from several kingdoms of the eukaryotes. The performance of the new Scipio version has been tested in comparison with several similar tools.
With the new version of Scipio very short exons, terminal and internal, of even just one amino acid can correctly be reconstructed. Scipio is also able to correctly predict almost all genes in cross-species searches even if the ancestors of the species separated more than 100 Myr ago and if the protein sequence identity is below 80%. For our test cases Scipio outperforms all other software tested. WebScipio has been restructured and provides easy access to the genome assemblies of about 640 eukaryotic species. Scipio and WebScipio are freely accessible at http://www.webscipio.org.
获取亲缘关系密切的生物体同源物的转录本,并检索基因的外显子 - 内含子重构模式,是蛋白质家族进化分析以及不同物种特定基因外显子 - 内含子结构比较分析过程中的一个非常重要的环节。由于基因组测序速度不断加快,与基因组注释之间的差距日益增大。因此,用于正确预测和重构相关生物体中基因的工具变得越来越重要。Scipio工具(也可通过图形界面WebScipio使用)对Blat程序的输出进行显著命中处理,以处理测序错误、缺失序列和碎片化的基因组组装。然而,到目前为止,Scipio仅限于高序列相似性,无法重构短外显子。
Scipio和WebScipio在根本上得到了扩展,以便更好地重构非常短的外显子和内含子剪接位点,并更适合跨物种基因结构预测。已实施Needleman - Wunsch算法来搜索Blat未识别的查询序列的短片段。这些区域可能是短外显子、内含子剪接位点处的分歧序列或高度分歧的外显子。我们通过针对真核生物几个界的物种进行搜索,以来自完全不同蛋白质家族的几个蛋白质实例展示了新参数的优势和用途。新的Scipio版本的性能已与几个类似工具进行了比较测试。
使用Scipio的新版本,即使只有一个氨基酸的非常短的末端和内部外显子也能正确重构。Scipio还能够在跨物种搜索中正确预测几乎所有基因,即使物种的祖先在1亿多年前就已分离,并且蛋白质序列同一性低于80%。对于我们的测试案例,Scipio优于所有其他测试软件。WebScipio已进行了重构,并提供了对约640种真核生物基因组组装的便捷访问。可通过http://www.webscipio.org免费访问Scipio和WebScipio。