基于知识的多序列比对算法。

A knowledge-based multiple-sequence alignment algorithm.

机构信息

Clayton State University, Morrow.

Georgia State University, Atlanta.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):884-96. doi: 10.1109/TCBB.2013.102.

DOI:10.1109/TCBB.2013.102

PMID:24334383

Abstract

A common and cost-effective mechanism to identify the functionalities, structures, or relationships between species is multiple-sequence alignment, in which DNA/RNA/protein sequences are arranged and aligned so that similarities between sequences are clustered together. Correctly identifying and aligning these sequence biological similarities help from unwinding the mystery of species evolution to drug design. We present our knowledge-based multiple sequence alignment (KB-MSA) technique that utilizes the existing knowledge databases such as SWISSPROT, GENBANK, or HOMSTRAD to provide a more realistic and reliable sequence alignment. We also provide a modified version of this algorithm (CB-MSA) that utilizes the sequence consistency information when sequence knowledge databases are not available. Our benchmark tests on BAliBASE, PREFAB, HOMSTRAD, and SABMARK references show accuracy improvements up to 10 percent on twilight data sets against many leading alignment tools such as ISPALIGN, PADT, CLUSTALW, MAFFT, PROBCONS, and T-COFFEE.

摘要

一种常见且具有成本效益的识别物种功能、结构或关系的机制是多序列比对，其中 DNA/RNA/蛋白质序列被排列和对齐，以便序列之间的相似性聚集在一起。正确识别和对齐这些序列生物学相似性有助于揭开物种进化的奥秘，从而有助于药物设计。我们提出了基于知识的多序列比对（KB-MSA）技术，该技术利用现有的知识数据库，如 SWISSPROT、GENBANK 或 HOMSTRAD，提供更真实和可靠的序列比对。当序列知识数据库不可用时，我们还提供了该算法的修改版本（CB-MSA），该版本利用序列一致性信息。我们在 BAliBASE、PREFAB、HOMSTRAD 和 SABMARK 参考数据集上的基准测试表明，与许多领先的比对工具（如 ISPALIGN、PADT、CLUSTALW、MAFFT、PROBCONS 和 T-COFFEE）相比，在黄昏数据集上的准确率提高了 10%。