Chao K M
Department of Computer Science and Information Management, Providence University, Shalu, Taichung, Taiwan.
Bioinformatics. 1999 Apr;15(4):298-304. doi: 10.1093/bioinformatics/15.4.298.
Given a genomic DNA sequence, it is still an open problem to determine its coding regions, i.e. the region consisting of exons and introns. The comparison of cDNA and genomic DNA helps the understanding of coding regions. For such an application, it might be adequate to use the restricted affine gap penalties which penalize long gaps with a constant penalty.
Several techniques developed for solving the approximate string-matching problem are employed to yield efficient algorithms for computing the optimal alignment with restricted affine gap penalties. In particular, efficient algorithms can be derived based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the cost tables. We have implemented the above methods in C on Sun workstations running SunOS Unix. Preliminary experiments show that these approaches are very promising for aligning a cDNA sequence with a genomic DNA sequence.
Calign is available free of charge by anonymous ftp at: iubio.bio. indiana.edu, directory: molbio/align, files: calign.driver.c calign. c. Another URL reference for the files is http://iubio.bio.indiana.edu/soft/molbio/align/+ ++calign.c.
给定一个基因组DNA序列,确定其编码区域(即由外显子和内含子组成的区域)仍然是一个未解决的问题。cDNA与基因组DNA的比较有助于理解编码区域。对于这样的应用,使用限制仿射间隙罚分可能就足够了,这种罚分对长间隙采用恒定罚分。
为解决近似字符串匹配问题而开发的几种技术被用于产生高效算法,以计算具有限制仿射间隙罚分的最优比对。特别是,可以基于带有失败转移的后缀自动机和代价表的对角线单调性推导出高效算法。我们已在运行SunOS Unix的Sun工作站上用C语言实现了上述方法。初步实验表明,这些方法在将cDNA序列与基因组DNA序列进行比对方面非常有前景。
可通过匿名ftp免费获取Calign,地址为:iubio.bio.indiana.edu,目录:molbio/align,文件:calign.driver.c、calign.c。这些文件的另一个URL参考是http://iubio.bio.indiana.edu/soft/molbio/align/+++calign.c。