Neuwald A F, Liu J S, Lawrence C E
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Protein Sci. 1995 Aug;4(8):1618-32. doi: 10.1002/pro.5560040820.
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles.
多个序列中局部保守区域(模体)的检测与比对,有助于深入了解蛋白质的结构、功能及进化。本文描述了一种新的吉布斯采样算法,该算法可检测序列中的模体编码区域,并将其最佳地划分为不同的模体模型;以一组免疫球蛋白折叠蛋白为例进行了说明。当应用于共享单个模体的序列时,该采样器可用于将模体区域分类为相关的子模型,以螺旋-转角-螺旋DNA结合蛋白为例进行了说明。还介绍了其他基于统计的方法,用于在数据库中搜索与采样器发现的模体匹配的序列。当应用于一组32个亲缘关系非常远的细菌整合外膜蛋白时,采样器发现它们共享一个微妙的重复模体。尽管BLAST(Altschul SF等人,1990年,《分子生物学杂志》215:403 - 410)未能检测到任何序列之间的显著成对相似性,但这些外膜蛋白中存在的重复序列作为一个整体具有高度显著性(基于本文描述的一种普遍适用的模体统计检验)。对具有已知三聚体β桶结构的细菌孔蛋白及相关蛋白的分析揭示了一个类似的重复模体,对应于交替的跨膜β链。这些β链出现在β桶的膜界面(与三聚体界面相对)。这些重复序列的广泛保守性和结构位置表明它们起着重要的功能作用。