蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。

Protein domain recurrence and order can enhance prediction of protein functions.

机构信息

Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

出版信息

Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.

DOI:10.1093/bioinformatics/bts398

PMID:22962465

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3436825/

Abstract

MOTIVATION

Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference.

RESULTS

We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior probability of the Gene Ontology terms based on domain recurrence and domain order information, whereas our second method, DRDO-NB, relies on the naïve Bayes methodology using the same domain architecture information. Our large-scale benchmark comparisons show strong improvements in the accuracy of the protein function inference achieved by our new methods, demonstrating that domain recurrence and order can provide important information for inference of protein functions.

AVAILABILITY

The new models are provided as open source programs at http://sfb.kaust.edu.sa/Pages/Software.aspx.

CONTACT

dkihara@cs.purdue.edu, xin.gao@kaust.edu.sa

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics Online.

摘要

动机

蓬勃发展的测序技术产生了大量的基因组和蛋白质组数据。注释在这些数据中发现的蛋白质的功能已成为一个大而关键的问题。已经开发了各种计算方法来基于蛋白质的序列或结构域来推断蛋白质的功能。然而，现有的方法忽略了蛋白质功能推断中蛋白质结构域的重现和顺序。

结果

我们开发了两种基于蛋白质结构域重现和结构域顺序推断蛋白质功能的新方法。我们的第一个方法 DRDO，基于结构域重现和结构域顺序信息计算基因本体术语的后验概率，而我们的第二个方法 DRDO-NB，则依赖于使用相同结构域体系结构信息的朴素贝叶斯方法。我们的大规模基准比较显示，我们的新方法在蛋白质功能推断的准确性方面取得了显著提高，表明结构域重现和顺序可以为蛋白质功能推断提供重要信息。

可用性

新模型作为开源程序在 http://sfb.kaust.edu.sa/Pages/Software.aspx 提供。

联系方式

dkihara@cs.purdue.edu，xin.gao@kaust.edu.sa

补充信息

补充数据可在 Bioinformatics Online 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3b6/3436825/6a53d7886b05/bts398f1.jpg

相似文献

Protein domain recurrence and order can enhance prediction of protein functions.

Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.

SECOM: a novel hash seed and community detection based-approach for genome-scale protein domain identification.

PLoS One. 2012;7(6):e39475. doi: 10.1371/journal.pone.0039475. Epub 2012 Jun 28.

WaVPeak: picking NMR peaks through wavelet-based smoothing and volume-based filtering.

Bioinformatics. 2012 Apr 1;28(7):914-20. doi: 10.1093/bioinformatics/bts078. Epub 2012 Feb 10.

Domain prediction with probabilistic directional context.

Bioinformatics. 2017 Aug 15;33(16):2471-2478. doi: 10.1093/bioinformatics/btx221.

Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data.

Bioinformatics. 2017 Aug 15;33(16):2523-2531. doi: 10.1093/bioinformatics/btx199.

FuncPatch: a web server for the fast Bayesian inference of conserved functional patches in protein 3D structures.

Bioinformatics. 2015 Feb 15;31(4):523-31. doi: 10.1093/bioinformatics/btu673. Epub 2014 Oct 15.

A naive Bayes model to predict coupling between seven transmembrane domain receptors and G-proteins.

Bioinformatics. 2003 Jan 22;19(2):234-40. doi: 10.1093/bioinformatics/19.2.234.

PANDA: Protein function prediction using domain architecture and affinity propagation.

Sci Rep. 2018 Feb 22;8(1):3484. doi: 10.1038/s41598-018-21849-1.

引用本文的文献

Domain-PFP allows protein function prediction using function-aware domain embedding representations.

Commun Biol. 2023 Oct 31;6(1):1103. doi: 10.1038/s42003-023-05476-9.

Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations.

bioRxiv. 2023 Aug 24:2023.08.23.554486. doi: 10.1101/2023.08.23.554486.

FAS: assessing the similarity between proteins using multi-layered feature architectures.

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad226.

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences.

Bioinformatics. 2019 Mar 1;35(5):753-759. doi: 10.1093/bioinformatics/bty704.

Missing gene identification using functional coherence scores.

Sci Rep. 2016 Aug 24;6:31725. doi: 10.1038/srep31725.

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB.

Bioinformatics. 2016 Aug 1;32(15):2264-71. doi: 10.1093/bioinformatics/btw114. Epub 2016 Mar 7.

PatchSurfers: Two methods for local molecular property-based binding ligand prediction.

Methods. 2016 Jan 15;93:41-50. doi: 10.1016/j.ymeth.2015.09.026. Epub 2015 Sep 30.

BMC Bioinformatics. 2015 May 13;16:154. doi: 10.1186/s12859-015-0570-8.

DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe.

BMC Bioinformatics. 2015 Mar 21;16:96. doi: 10.1186/s12859-015-0499-y.

本文引用的文献

Structure- and sequence-based function prediction for non-homologous proteins.

J Struct Funct Genomics. 2012 Jun;13(2):111-23. doi: 10.1007/s10969-012-9126-6. Epub 2012 Jan 22.

SUPERFAMILY 1.75 including a domain-centric gene ontology method.

Nucleic Acids Res. 2011 Jan;39(Database issue):D427-34. doi: 10.1093/nar/gkq1130. Epub 2010 Nov 9.

Real-time ligand binding pocket database search using local surface descriptors.

Proteins. 2010 Jul;78(9):2007-28. doi: 10.1002/prot.22715.

Global sequence properties for superfamily prediction: a machine learning approach.

J Integr Bioinform. 2009 Aug 23;6(1):109. doi: 10.2390/biecoll-jib-2009-109.

Generation of new protein functions by nonhomologous combinations and rearrangements of domains and modules.

Curr Opin Biotechnol. 2009 Aug;20(4):398-404. doi: 10.1016/j.copbio.2009.07.007. Epub 2009 Aug 21.

PLoS Comput Biol. 2009 Jul;5(7):e1000443. doi: 10.1371/journal.pcbi.1000443. Epub 2009 Jul 31.

ESG: extended similarity group method for automated protein function prediction.

Bioinformatics. 2009 Jul 15;25(14):1739-45. doi: 10.1093/bioinformatics/btp309. Epub 2009 May 12.

Protein function prediction--the power of multiplicity.

Trends Biotechnol. 2009 Apr;27(4):210-9. doi: 10.1016/j.tibtech.2009.01.002. Epub 2009 Feb 27.

A dual role for the adaptor protein DRK in Drosophila olfactory learning and memory.

J Neurosci. 2009 Feb 25;29(8):2611-25. doi: 10.1523/JNEUROSCI.3670-08.2009.

Predicting protein function from domain content.

Bioinformatics. 2008 Aug 1;24(15):1681-7. doi: 10.1093/bioinformatics/btn312. Epub 2008 Jun 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。

Protein domain recurrence and order can enhance prediction of protein functions.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献