Kalinina Olga V, Mironov Andrey A, Gelfand Mikhail S, Rakhmaninova Aleksandra B
State Scientific Center GosNIIGenetika, 1st Dorozhny pr., 1, Moscow 113545, Russia.
Protein Sci. 2004 Feb;13(2):443-56. doi: 10.1110/ps.03191704.
The increasing volume of genomic data opens new possibilities for analysis of protein function. We introduce a method for automated selection of residues that determine the functional specificity of proteins with a common general function (the specificity-determining positions [SDP] prediction method). Such residues are assumed to be conserved within groups of orthologs (that may be assumed to have the same specificity) and to vary between paralogs. Thus, considering a multiple sequence alignment of a protein family divided into orthologous groups, one can select positions where the distribution of amino acids correlates with this division. Unlike previously published techniques, the introduced method directly takes into account nonuniformity of amino acid substitution frequencies. In addition, it does not require setting arbitrary thresholds. Instead, a formal procedure for threshold selection using the Bernoulli estimator is implemented. We tested the SDP prediction method on the LacI family of bacterial transcription factors and a sample of bacterial water and glycerol transporters belonging to the major intrinsic protein (MIP) family. In both cases, the comparison with available experimental and structural data strongly supported our predictions.
基因组数据量的不断增加为蛋白质功能分析开辟了新的可能性。我们介绍了一种自动选择决定具有共同一般功能的蛋白质功能特异性的残基的方法(特异性决定位置[SDP]预测方法)。假定此类残基在直系同源物组内是保守的(可假定具有相同的特异性),而在旁系同源物之间是可变的。因此,考虑将蛋白质家族的多序列比对划分为直系同源组,就可以选择氨基酸分布与该划分相关的位置。与先前发表的技术不同,所介绍的方法直接考虑了氨基酸替代频率的不均匀性。此外,它不需要设置任意阈值。相反,实施了使用伯努利估计器进行阈值选择的形式化程序。我们在细菌转录因子的LacI家族以及属于主要内在蛋白(MIP)家族的细菌水和甘油转运蛋白样本上测试了SDP预测方法。在这两种情况下,与现有实验和结构数据的比较都有力地支持了我们的预测。