Forslund Kristoffer, Sonnhammer Erik L L
Stockholm Bioinformatics Centre, Stockholm University, 10691 Stockholm, Sweden.
Bioinformatics. 2008 Aug 1;24(15):1681-7. doi: 10.1093/bioinformatics/btn312. Epub 2008 Jun 30.
Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.
Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.
Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar
蛋白质功能的计算分配可能是后基因组时代生物信息学最重要的单一应用。这些分配是基于各种蛋白质特征进行的,其中一个特征是可识别结构域的存在。研究蛋白质结构域组成与功能之间的关系,对于理解结构域组合如何编码复杂功能至关重要。
提出了两种关于蛋白质结构域组合如何产生特定功能的不同模型:一种基于规则,一种基于概率。我们展示了这些模型如何用于基因本体注释转移。第一种是对Pfam2GO映射的直观概括,可检测结构域集严格功能含义的情况。第二种使用概率模型来表示结构域组成与注释术语之间的关系,发现它更适合不完整的训练集。我们将这些模型实现为基因本体功能注释术语的预测器。在大规模数据集上,这两种预测器都比传统的最佳BLAST比对注释转移更准确,并且比单结构域模型更敏感。我们展示了许多案例,其中Pfam-A蛋白质结构域的组合预测了单个结构域无法得出的功能术语。
脚本和文档可从http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar下载。