Wang Yong, Sadreyev Ruslan I, Grishin Nick V
Biomedical Engineering Program, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA.
Nucleic Acids Res. 2009 Jun;37(11):3522-30. doi: 10.1093/nar/gkp212. Epub 2009 Apr 7.
Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of 'vertical' MSA context (substitution constraints at individual sequence positions) and 'horizontal' context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.
检测远程序列同源性对于准确推断蛋白质结构、功能和进化至关重要。最灵敏的检测方法涉及比较蛋白质家族多序列比对(MSA)中反映的进化模式。我们提出了PROCAIN,这是一种基于“垂直”MSA上下文(单个序列位置的替换约束)和“水平”上下文(多个位置的残基含量模式)相结合的MSA比较新方法。基于一种简单且易于处理的轮廓方法以及水平MSA模式相似性的基本度量,该方法实现的同源性检测质量与采用隐马尔可夫模型(HMM)和二级结构(SS)预测的更复杂的先进方法相当。添加SS信息进一步提升了PROCAIN的性能,超越了当前最先进工具的能力。通过检测进化距离较远但结构相似的蛋白质结构域之间的细微同源性,说明了该方法在结构/功能预测方面的潜在价值。ProCAIn、相关数据库和工具可从以下网址下载:http://prodata.swmed.edu/procain/download。可通过http://prodata.swmed.edu/procain/procain.php访问网络服务器。