Masseroli Marco, Bellistri Elisa, Franceschini Andrea, Pinciroli Francesco
Dipartimento di Elettronica e Informazione, Politecnico di Milano, piazza Leonardo da Vinci 32, 20133 Milano, Italy.
BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S14. doi: 10.1186/1471-2105-8-S1-S14.
The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within GFINDer, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining.
Exploiting protein information in Pfam and InterPro databanks, we developed and added in GFINDer original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the Statistics Protein Families&Domains module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the Logistic Regression module allows identifying protein functional signatures that better explain the considered gene classification.
Novel GFINDer modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes.
基于蛋白质家族和结构域的注释不断增加,这构成了理解蛋白质功能以及深入了解其编码基因之间关系的重要信息。为了能够分析基因的蛋白质组学注释,我们在GFINDer中实现了新的模块,GFINDer是我们之前开发的一个网络系统,它可以动态汇总用户上传的基因列表的功能和表型注释,并允许对其进行统计分析和挖掘。
利用Pfam和InterPro数据库中的蛋白质信息,我们在GFINDer中开发并添加了专门用于探索和分析基因蛋白质产物功能特征的原始模块。这些模块允许用有关相关蛋白质家族、结构域和功能位点的受控信息注释众多用户分类的核苷酸序列标识符,根据此类蛋白质注释类别对它们进行分类,并对获得的分类进行统计分析。特别是,当上传的核苷酸序列标识符被细分为不同类别时,“统计蛋白质家族和结构域”模块通过突出显示在用户定义的基因类别中明显更具代表性的蛋白质特征,来估计Pfam或InterPro受控注释与上传基因的相关性。此外,“逻辑回归”模块允许识别能更好地解释所考虑的基因分类的蛋白质功能特征。
GFINDer的新模块提供了基因组蛋白质家族和结构域分析,有助于对基因类别进行更好的功能解释,例如通过对微阵列实验的基因表达结果进行统计和聚类分析所定义的基因类别。因此,它们有助于理解受蛋白质结构域组成影响的基本生物学过程和复杂细胞机制,并有助于揭示有关编码基因的新生物医学知识。