Suppr超能文献

蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。

Protein domain recurrence and order can enhance prediction of protein functions.

机构信息

Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

出版信息

Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.

Abstract

MOTIVATION

Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference.

RESULTS

We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior probability of the Gene Ontology terms based on domain recurrence and domain order information, whereas our second method, DRDO-NB, relies on the naïve Bayes methodology using the same domain architecture information. Our large-scale benchmark comparisons show strong improvements in the accuracy of the protein function inference achieved by our new methods, demonstrating that domain recurrence and order can provide important information for inference of protein functions.

AVAILABILITY

The new models are provided as open source programs at http://sfb.kaust.edu.sa/Pages/Software.aspx.

CONTACT

dkihara@cs.purdue.edu, xin.gao@kaust.edu.sa

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics Online.

摘要

动机

蓬勃发展的测序技术产生了大量的基因组和蛋白质组数据。注释在这些数据中发现的蛋白质的功能已成为一个大而关键的问题。已经开发了各种计算方法来基于蛋白质的序列或结构域来推断蛋白质的功能。然而,现有的方法忽略了蛋白质功能推断中蛋白质结构域的重现和顺序。

结果

我们开发了两种基于蛋白质结构域重现和结构域顺序推断蛋白质功能的新方法。我们的第一个方法 DRDO,基于结构域重现和结构域顺序信息计算基因本体术语的后验概率,而我们的第二个方法 DRDO-NB,则依赖于使用相同结构域体系结构信息的朴素贝叶斯方法。我们的大规模基准比较显示,我们的新方法在蛋白质功能推断的准确性方面取得了显著提高,表明结构域重现和顺序可以为蛋白质功能推断提供重要信息。

可用性

新模型作为开源程序在 http://sfb.kaust.edu.sa/Pages/Software.aspx 提供。

联系方式

dkihara@cs.purdue.eduxin.gao@kaust.edu.sa

补充信息

补充数据可在 Bioinformatics Online 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3b6/3436825/6a53d7886b05/bts398f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验