Shah Parantu K, Tripathi Lokesh P, Jensen Lars Juhl, Gahnim Murad, Mason Christopher, Furlong Eileen E, Rodrigues Veronica, White Kevin P, Bork Peer, Sowdhamini R
European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, Germany.
Gene. 2008 Jan 15;407(1-2):199-215. doi: 10.1016/j.gene.2007.10.012. Epub 2007 Oct 15.
Systematically annotating function of enzymes that belong to large protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190 Drosophila gene products containing serine-protease-like domains, of which 35 are SPHs. This was accomplished by analysis of structure-function relationships, gene-expression profiles, large-scale protein-protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on Drosophila proteases to assign putative competitive inhibitor relationships (CIRs). Microarray gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in immune response. We found specific recruitment of SPHs and proteases with CLIP domains in immune response, suggesting evolution of a new function for SPHs. We also suggest existence of separate downstream protease cascades for immune response against bacterial/fungal infections and parasite/parasitoid infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes.
系统地注释单个真核生物基因组中编码的大型蛋白质家族中酶的功能是一项极具挑战性的任务。我们开展了这样一项工作,以注释黑腹果蝇中胰蛋白酶折叠的丝氨酸蛋白酶家族的功能,重点是注释可能已丧失催化功能的丝氨酸蛋白酶同源物(SPH)。我们的方法涉及数据挖掘和数据整合,以提供对190种含有丝氨酸蛋白酶样结构域的果蝇基因产物的功能注释,其中35种是SPH。这是通过分析结构 - 功能关系、基因表达谱、大规模蛋白质 - 蛋白质相互作用数据、文献挖掘和生物信息学工具来完成的。我们引入了功能残基聚类(FRC),这是一种利用功能重要残基的特性对序列进行层次聚类的方法,并利用相关系数作为定量相似性度量,将体内底物特异性转移到蛋白酶上。我们表明,使用这种方法转移底物特异性信息的效率通常很高。FRC还应用于果蝇蛋白酶,以确定推定的竞争性抑制剂关系(CIR)。利用微阵列基因表达数据揭示了蛋白酶在发育和免疫反应中的大规模双重参与。我们发现在免疫反应中SPH和具有CLIP结构域的蛋白酶有特异性募集,这表明SPH有新功能的进化。我们还提出存在针对细菌/真菌感染和寄生虫/寄生蜂感染的免疫反应的单独下游蛋白酶级联反应。我们使用来自RNAi筛选和其他证据类型的信息来验证我们注释的质量。利用这种多方面的方法使果蝇丝氨酸蛋白酶的功能注释增加了10倍,并证明了在增加多个基因组注释方面的价值。