Chao K M, Zhang J, Ostell J, Miller W
Department of Computer Science and Information Management, Providence University, Shalu, Taichung, Taiwan.
Comput Appl Biosci. 1997 Feb;13(1):75-80. doi: 10.1093/bioinformatics/13.1.75.
We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.
我们开发了一个名为sim3的计算机程序,它能解决以下计算问题。给定两条DNA序列,其中较短的序列与较长序列的某个连续区域非常相似。Sim3会确定较长序列中的这样一个相似区域,然后计算出一组最优的单核苷酸变化(即插入、缺失或替换),这些变化将把较短序列转化为该区域。因此,比对评分方案旨在模拟测序错误,而非进化过程。在工作站上,该程序能在几秒内将一个100 kb的序列与一个1兆碱基的序列进行比对,前提是较短序列与较长序列中的某个区域差异非常小。该程序已被用于为美国国立生物技术信息中心的基因组部门组装序列数据。