Espadaler Jordi, Aragüés Ramón, Eswar Narayanan, Marti-Renom Marc A, Querol Enrique, Avilés Francesc X, Sali Andrej, Oliva Baldomero
Laboratori de Bioinformàtica Estructural, Grup de Recerca en Informàtica Biomèdica-Institut Municipal d'Investigació Médica (GRIB-IMIM), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain.
Proc Natl Acad Sci U S A. 2005 May 17;102(20):7151-6. doi: 10.1073/pnas.0500831102. Epub 2005 May 9.
The function of an uncharacterized protein is usually inferred either from its homology to, or its interactions with, characterized proteins. Here, we use both sequence similarity and protein interactions to identify relationships between remotely related protein sequences. We rely on the fact that homologous sequences share similar interactions, and, therefore, the set of interacting partners of the partners of a given protein is enriched by its homologs. The approach was bench-marked by assigning the fold and functional family to test sequences of known structure. Specifically, we relied on 1,434 proteins with known folds, as defined in the Structural Classification of Proteins (SCOP) database, and with known interacting partners, as defined in the Database of Interacting Proteins (DIP). For this subset, the specificity of fold assignment was increased from 54% for position-specific iterative BLAST to 75% for our approach, with a concomitant increase in sensitivity for a few percentage points. Similarly, the specificity of family assignment at the e-value threshold of 10(-8) was increased from 70% to 87%. The proposed method would be a useful tool for large-scale automated discovery of remote relationships between protein sequences, given its unique reliance on sequence similarity and protein-protein interactions.
通常通过与已明确特征的蛋白质的同源性或相互作用来推断未知蛋白质的功能。在此,我们利用序列相似性和蛋白质相互作用来识别远缘相关蛋白质序列之间的关系。我们依据的事实是,同源序列具有相似的相互作用,因此,给定蛋白质的相互作用伙伴的伙伴集合会因与其同源的蛋白质而得到富集。通过为已知结构的测试序列指定折叠类型和功能家族来对该方法进行基准测试。具体而言,我们依据蛋白质结构分类(SCOP)数据库中定义的具有已知折叠类型以及相互作用蛋白质数据库(DIP)中定义的具有已知相互作用伙伴的1434种蛋白质。对于这个子集,折叠类型指定的特异性从位置特异性迭代BLAST的54%提高到了我们方法的75%,同时敏感性也提高了几个百分点。同样,在e值阈值为10^(-8)时,家族指定特异性从70%提高到了87%。鉴于该方法独特地依赖于序列相似性和蛋白质 - 蛋白质相互作用,所提出的方法将成为大规模自动发现蛋白质序列之间远缘关系的有用工具。