Usuka J, Zhu W, Brendel V
Department of Chemistry, Stanford University, Stanford, CA 94305, USA Department of Zoology and Genetics, Iowa State University, 2112 Molecular Biology Building, Ames, IA 50011-3260, USA.
Bioinformatics. 2000 Mar;16(3):203-11. doi: 10.1093/bioinformatics/16.3.203.
Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However, marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST. In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching.
We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely only on either search by content and signal or on sequence similarity.
The algorithm is available as a C subroutine and is implemented in the SplicePredictor and GeneSeqer programs. The source code is available via anonymous ftp from ftp. zmdb.iastate.edu. Both programs are also implemented as a Web service at http://gremlin1.zool.iastate.edu/cgi-bin/s p.cgiand http://gremlin1.zool.iastate.edu/cgi-bin/g s.cgi, respectively.
补充性的cDNA或EST证据对于区分通过多种必要程序进行的计算序列检查得出的基因预测结果往往具有决定性作用。若不进行额外的实验工作,这种方法必须依赖于在所研究物种现有的、通常不完整的EST文库中出现所考虑基因的同源EST。在某些情况下,即使cDNA或EST是由非同源基因组DNA产生的,包括基因家族的不同位点或不同物种的同源位点,特定的外显子分配也可通过序列匹配得到支持。然而,仅微弱的序列匹配也可能产生误导。我们试图开发一种算法,该算法能同时对预测的内在剪接位点强度以及基因组DNA模板与相关cDNA或EST之间的序列匹配进行评分。在这种情况下,基于周围序列匹配,可能会选择预测较弱的剪接位点用于最优评分的剪接比对。即使没有强序列匹配,预测较强的剪接位点也会进入最优剪接比对。
我们设计了一种新颖的算法,该算法基于对序列匹配和内在剪接位点强度的评分,生成基因组DNA与cDNA或EST的最优剪接比对。通过实例,我们证明与仅依赖内容和信号搜索或序列相似性的当前方法相比,这种综合方法似乎提高了基因预测的准确性。
该算法作为一个C子程序可用,并在SplicePredictor和GeneSeqer程序中实现。源代码可通过匿名ftp从ftp.zmdb.iastate.edu获取。这两个程序也分别作为Web服务在http://gremlin1.zool.iastate.edu/cgi-bin/sp.cgi和http://gremlin1.zool.iastate.edu/cgi-bin/gs.cgi上实现。