Sunyaev S R, Eisenhaber F, Rodchenkov I V, Eisenhaber B, Tumanyan V G, Kuznetsov E N
European Molecular Biology Laboratory, Meyerhofstrasse1, Postfach 10. 2209, D-69012 Heidelberg, Germany.
Protein Eng. 1999 May;12(5):387-94. doi: 10.1093/protein/12.5.387.
Sequence weighting techniques are aimed at balancing redundant observed information from subsets of similar sequences in multiple alignments. Traditional approaches apply the same weight to all positions of a given sequence, hence equal efficiency of phylogenetic changes is assumed along the whole sequence. This restrictive assumption is not required for the new method PSIC (position-specific independent counts) described in this paper. The number of independent observations (counts) of an amino acid type at a given alignment position is calculated from the overall similarity of the sequences that share the amino acid type at this position with the help of statistical concepts. This approach allows the fast computation of position-specific sequence weights even for alignments containing hundreds of sequences. The PSIC approach has been applied to profile extraction and to the fold family assignment of protein sequences with known structures. Our method was shown to be very productive in finding distantly related sequences and more powerful than Hidden Markov Models or the profile methods in WiseTools and PSI-BLAST in many cases. The profile extraction routine is available on the WWW (http://www.bork.embl-heidelberg. de/PSIC or http://www.imb.ac.ru/PSIC).
序列加权技术旨在平衡多序列比对中相似序列子集的冗余观测信息。传统方法对给定序列的所有位置赋予相同权重,因此假定沿整个序列系统发育变化的效率相同。本文描述的新方法PSIC(位点特异性独立计数)不需要这种限制性假设。在统计概念的帮助下,根据在给定比对位置共享氨基酸类型的序列的总体相似性,计算该氨基酸类型的独立观测值(计数)数量。即使对于包含数百个序列的比对,这种方法也能快速计算位点特异性序列权重。PSIC方法已应用于轮廓提取以及具有已知结构的蛋白质序列的折叠家族分配。在许多情况下,我们的方法在寻找远缘相关序列方面非常有效,并且比隐马尔可夫模型或WiseTools和PSI-BLAST中的轮廓方法更强大。轮廓提取程序可在万维网上获取(http://www.bork.embl-heidelberg.de/PSIC或http://www.imb.ac.ru/PSIC)。