Panchenko Anna R, Bryant Stephen H
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Protein Sci. 2002 Feb;11(2):361-70. doi: 10.1110/ps.19902.
Sequence comparison methods based on position-specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI-BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW-pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three-dimensional molecular models predicted by PSSM-sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local-structure (VAST) > local-sequence (BLAST) > global-sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30-50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM-sequence alignment accuracy and recognition sensitivity.
基于位置特异性得分矩阵(PSSM)的序列比较方法已被证明是识别蛋白质家族中不同成员以及注释功能位点的有用工具。在此,我们研究了影响PSI-BLAST搜索中PSSM整体性能的一个因素,即用于构建PSSM所基于的种子比对的算法。我们比较了基于通过全局序列相似性(ClustalW和ClustalW成对)、局部序列相似性(BLAST)和局部结构相似性(VAST)构建的比对的PSSM。为了评估在识别保守功能或结构位点方面的性能,我们检查了由PSSM序列比对预测的三维分子模型的准确性。以那些序列的已知结构作为真实标准,我们发现模型准确性随用于种子比对构建的算法而变化,模式为局部结构(VAST)>局部序列(BLAST)>全局序列(ClustalW)。以查询蛋白和数据库蛋白的结构相似性作为真实标准,我们发现PSSM识别灵敏度主要取决于比对中包含的序列的多样性,平均成对同一性约为30 - 50%时达到最佳。我们讨论了这些观察结果,并提出了一种构建种子比对的策略,以优化PSSM序列比对准确性和识别灵敏度。