Halperin E, Faigler S, Gill-More R
Compugen Ltd., Tel Aviv, Israel.
Bioinformatics. 1999 Nov;15(11):867-73. doi: 10.1093/bioinformatics/15.11.867.
Automated annotation of Expressed Sequence Tags (ESTs) is becoming increasingly important as EST databases continue to grow rapidly. A common approach to annotation is to align the gene fragments against well-documented databases of protein sequences. The sensitivity of the alignment algorithm is key to the success of such methods.
This paper introduces a new algorithm, FramePlus, for DNA-protein sequence alignment. The SCOP database was used to develop a general framework for testing the sensitivity of such alignment algorithms when searching large databases. Using this framework, the performance of FramePlus was found to be somewhat better than other algorithms in the presence of moderate and high rates of frameshift errors, and comparable to Translated Search in the absence of sequencing errors.
The source code for FramePlus and the testing datasets are freely available at ftp.compugen.co.il/pub/research.
随着表达序列标签(EST)数据库持续快速增长,EST的自动注释变得越来越重要。一种常见的注释方法是将基因片段与记录完备的蛋白质序列数据库进行比对。比对算法的灵敏度是此类方法成功的关键。
本文介绍了一种用于DNA-蛋白质序列比对的新算法FramePlus。SCOP数据库被用于开发一个通用框架,以在搜索大型数据库时测试此类比对算法的灵敏度。使用该框架发现,在存在中度和高度移码错误率的情况下,FramePlus的性能略优于其他算法,在不存在测序错误的情况下与翻译搜索相当。
FramePlus的源代码和测试数据集可从ftp.compugen.co.il/pub/research免费获取。