Papadopoulos Jason S, Agarwala Richa
National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA.
Bioinformatics. 2007 May 1;23(9):1073-9. doi: 10.1093/bioinformatics/btm076. Epub 2007 Mar 1.
A tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains, and has a good compromise between speed and accuracy will have practical advantages over current tools.
We describe COBALT, a constraint based alignment tool that implements a general framework for multiple alignment of protein sequences. COBALT finds a collection of pairwise constraints derived from database searches, sequence similarity and user input, combines these pairwise constraints, and then incorporates them into a progressive multiple alignment. We show that using constraints derived from the conserved domain database (CDD) and PROSITE protein-motif database improves COBALT's alignment quality. We also show that COBALT has reasonable runtime performance and alignment accuracy comparable to or exceeding that of other tools for a broad range of problems.
COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT, and CDD and PROSITE data used is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt
一种能够同时比对多个蛋白质序列、自动利用蛋白质结构域信息并且在速度和准确性之间取得良好平衡的工具,相较于当前工具具有实际优势。
我们描述了COBALT,一种基于约束的比对工具,它实现了蛋白质序列多重比对的通用框架。COBALT找到从数据库搜索、序列相似性和用户输入中得出的一组成对约束,将这些成对约束合并,然后将它们纳入渐进式多重比对。我们表明,使用源自保守结构域数据库(CDD)和PROSITE蛋白质基序数据库的约束可提高COBALT的比对质量。我们还表明,对于广泛的问题,COBALT具有合理的运行时性能和比对准确性,可与其他工具相媲美或超过其他工具。
COBALT包含在NCBI C++工具包中。可从以下网址获取COBALT的Linux可执行文件以及所使用的CDD和PROSITE数据:ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt