Herbordt Martin C, Model Josh, Sukhwani Bharat, Gu Yongfeng, Vancourt Tom
Department of Electrical and Computer Engineering, Boston University; Boston, MA 02215, Web: http://www.bu.edu/caadlab .
Parallel Comput. 2007 Nov;33(10-11):741-756. doi: 10.1016/j.parco.2007.09.003.
Approximate string matching is fundamental to bioinformatics and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic-programming- (DP) based methods. Our primary contribution is a new algorithm for emulating the seeding and extension phases of BLAST. This operates in a single pass through a database at streaming rate, and with no preprocessing other than loading the query string. Moreover, it emulates parameters turned to maximum possible sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations showing order of magnitude acceleration over serial reference code. A simple extension assures compatibility with NCBI BLAST.
近似字符串匹配是生物信息学的基础,并且一直是众多FPGA加速研究的主题。我们解决了基于BLAST和动态规划(DP)方法的FPGA实现方面的问题。我们的主要贡献是一种用于模拟BLAST的种子生成和扩展阶段的新算法。该算法以流速率单次遍历数据库,除了加载查询字符串外无需进行预处理。此外,它在不降低速度的情况下模拟了转向最大可能灵敏度的参数。虽然当前基于DP的方法也以流速率运行,但生成结果可能很麻烦。我们通过一种新的数据提取结构来解决这个问题。我们展示了几种实现的结果,表明与串行参考代码相比有数量级的加速。一个简单的扩展确保了与NCBI BLAST的兼容性。