Gribskov M, McLachlan A D, Eisenberg D
Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355-8. doi: 10.1073/pnas.84.13.4355.
Profile analysis is a method for detecting distantly related proteins by sequence comparison. The basis for comparison is not only the customary Dayhoff mutational-distance matrix but also the results of structural studies and information implicit in the alignments of the sequences of families of similar proteins. This information is expressed in a position-specific scoring table (profile), which is created from a group of sequences previously aligned by structural or sequence similarity. The similarity of any other sequence (target) to the group of aligned sequences (probe) can be tested by comparing the target to the profile using dynamic programming algorithms. The profile method differs in two major respects from methods of sequence comparison in common use: (i) Any number of known sequences can be used to construct the profile, allowing more information to be used in the testing of the target than is possible with pairwise alignment methods. (ii) The profile includes the penalties for insertion or deletion at each position, which allow one to include the probe secondary structure in the testing scheme. Tests with globin and immunoglobulin sequences show that profile analysis can distinguish all members of these families from all other sequences in a database containing 3800 protein sequences.
轮廓分析是一种通过序列比较来检测远缘相关蛋白质的方法。比较的基础不仅是传统的戴霍夫突变距离矩阵,还包括结构研究的结果以及相似蛋白质家族序列比对中隐含的信息。这些信息以位置特异性评分表(轮廓)表示,该表由一组先前通过结构或序列相似性比对的序列创建。通过使用动态规划算法将目标序列与轮廓进行比较,可以测试任何其他序列(目标序列)与比对序列组(探针序列)的相似性。轮廓方法在两个主要方面与常用的序列比较方法不同:(i)可以使用任意数量的已知序列来构建轮廓,与两两比对方法相比,这使得在测试目标序列时可以使用更多信息。(ii)轮廓包括每个位置插入或缺失的罚分,这使得可以将探针二级结构纳入测试方案。对珠蛋白和免疫球蛋白序列的测试表明,轮廓分析能够在一个包含3800个蛋白质序列的数据库中,将这些家族的所有成员与所有其他序列区分开来。