Edgar Robert C, Sjölander Kimmen
195 Roque Moraes Drive, Mill Valley, CA 94941, USA.
Bioinformatics. 2003 Jul 22;19(11):1404-11. doi: 10.1093/bioinformatics/btg158.
Aligning multiple proteins based on sequence information alone is challenging if sequence identity is low or there is a significant degree of structural divergence. We present a novel algorithm (SATCHMO) that is designed to address this challenge. SATCHMO simultaneously constructs a tree and a set of multiple sequence alignments, one for each internal node of the tree. The alignment at a given node contains all sequences within its sub-tree, and predicts which positions in those sequences are alignable and which are not. Aligned regions therefore typically get shorter on a path from a leaf to the root as sequences diverge in structure. Current methods either regard all positions as alignable (e.g. ClustalW), or align only those positions believed to be homologous across all sequences (e.g. profile HMM methods); by contrast SATCHMO makes different predictions of alignable regions in different subgroups. SATCHMO generates profile hidden Markov models at each node; these are used to determine branching order, to align sequences and to predict structurally alignable regions.
In experiments on the BAliBASE benchmark alignment database, SATCHMO is shown to perform comparably to ClustalW and the UCSC SAM HMM software. Results using SATCHMO to identify protein domains are demonstrated on potassium channels, with implications for the mechanism by which tumor necrosis factor alpha affects potassium current.
The software is available for download from http://www.drive5.com/lobster/index.htm
如果序列同一性较低或存在显著程度的结构差异,仅基于序列信息对多个蛋白质进行比对具有挑战性。我们提出了一种旨在应对这一挑战的新算法(SATCHMO)。SATCHMO同时构建一棵树和一组多序列比对,树的每个内部节点对应一个比对。给定节点处的比对包含其子树内的所有序列,并预测这些序列中哪些位置可比对以及哪些位置不可比对。因此,随着序列在结构上的差异,从叶节点到根节点的路径上比对区域通常会变短。当前方法要么将所有位置视为可比对的(例如ClustalW),要么只比对所有序列中被认为同源的那些位置(例如profile HMM方法);相比之下,SATCHMO对不同子组中可比对区域做出不同预测。SATCHMO在每个节点生成profile隐藏马尔可夫模型;这些模型用于确定分支顺序、比对序列以及预测结构上可比对的区域。
在BAliBASE基准比对数据库上的实验表明,SATCHMO的性能与ClustalW和UCSC SAM HMM软件相当。使用SATCHMO鉴定蛋白质结构域的结果在钾通道上得到了验证,这对肿瘤坏死因子α影响钾电流的机制具有启示意义。