Suppr超能文献

ENZYMAP:利用蛋白质注释对 UniProt/Swiss-Prot 中的 EC 编号变化进行建模和预测。

ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot.

机构信息

Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil ; Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.

Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.

出版信息

PLoS One. 2014 Feb 19;9(2):e89162. doi: 10.1371/journal.pone.0089162. eCollection 2014.

Abstract

The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.

摘要

生物数据的数量和多样性正在以非常高的速度增长。大量的蛋白质序列和结构、蛋白质和遗传相互作用以及表型研究已经产生。由于手动注释它们是不可能的,因此需要高效和精确的自动注释方法,以确保生物数据及其相关注释的质量和可靠性。

我们提出了 ENZYMatic Annotation Predictor (ENZYMAP),这是一种使用监督学习方法根据 UniProt/Swiss-Prot 中的注释来描述和预测 EC 编号变化的技术。我们使用来自 UniProt/Swiss-Prot 和 UniProt/TrEMBL 的测试数据集对 ENZYMAP 进行了实验评估,并表明使用选定类型的注释来预测 EC 变化是可行的。最后,我们比较了 ENZYMAP 和 DETECT 在预测方面的表现,并将两者与 UniProt/Swiss-Prot 的注释进行了比较。结果表明,ENZYMAP 比 DETECT 更准确,更接近 UniProt/Swiss-Prot 的实际变化。

我们的提议旨在成为一种自动补充方法(可以与其他技术结合使用,如基于蛋白质序列和结构的技术),有助于随着时间的推移提高酶注释的质量和可靠性,建议进行可能的更正,预测注释变化并传播整个数据集的隐含知识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1708/3929618/620c74b89042/pone.0089162.g001.jpg

相似文献

1
ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot.
PLoS One. 2014 Feb 19;9(2):e89162. doi: 10.1371/journal.pone.0089162. eCollection 2014.
3
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.
4
DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe.
BMC Bioinformatics. 2015 Mar 21;16:96. doi: 10.1186/s12859-015-0499-y.
5
The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools.
Plant Physiol Biochem. 2004 Dec;42(12):1013-21. doi: 10.1016/j.plaphy.2004.10.009. Epub 2004 Dec 15.
6
Automatically extracting functionally equivalent proteins from SwissProt.
BMC Bioinformatics. 2008 Oct 6;9:418. doi: 10.1186/1471-2105-9-418.
8
UniProtKB/Swiss-Prot.
Methods Mol Biol. 2007;406:89-112. doi: 10.1007/978-1-59745-535-0_4.
9
Bioinformatics analysis of correlation between protein function and intrinsic disorder.
Int J Biol Macromol. 2021 Jan 15;167:446-456. doi: 10.1016/j.ijbiomac.2020.11.211. Epub 2020 Dec 2.
10
Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase.
Hum Mutat. 2008 Mar;29(3):361-6. doi: 10.1002/humu.20671.

引用本文的文献

1
Annotation Vocabulary (Might Be) All You Need.
bioRxiv. 2024 Jul 31:2024.07.30.605924. doi: 10.1101/2024.07.30.605924.
2
VTR: A Web Tool for Identifying Analogous Contacts on Protein Structures and Their Complexes.
Front Bioinform. 2021 Nov 8;1:730350. doi: 10.3389/fbinf.2021.730350. eCollection 2021.
3
Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering.
PLoS Comput Biol. 2016 Jun 27;12(6):e1005001. doi: 10.1371/journal.pcbi.1005001. eCollection 2016 Jun.

本文引用的文献

1
aCSM: noise-free graph-based signatures to large-scale receptor-based ligand prediction.
Bioinformatics. 2013 Apr 1;29(7):855-61. doi: 10.1093/bioinformatics/btt058. Epub 2013 Feb 8.
2
A large-scale evaluation of computational protein function prediction.
Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.
3
Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns.
BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S12. doi: 10.1186/1471-2164-12-S4-S12.
4
Reorganizing the protein space at the Universal Protein Resource (UniProt).
Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5. doi: 10.1093/nar/gkr981. Epub 2011 Nov 18.
5
KEGG for integration and interpretation of large-scale molecular data sets.
Nucleic Acids Res. 2012 Jan;40(Database issue):D109-14. doi: 10.1093/nar/gkr988. Epub 2011 Nov 10.
6
FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies.
Nucleic Acids Res. 2012 Jan;40(Database issue):D776-82. doi: 10.1093/nar/gkr852. Epub 2011 Oct 17.
7
EnzymeDetector: an integrated enzyme function prediction tool and database.
BMC Bioinformatics. 2011 Sep 23;12:376. doi: 10.1186/1471-2105-12-376.
8
Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition.
Bioinformatics. 2011 May 15;27(10):1413-21. doi: 10.1093/bioinformatics/btr143. Epub 2011 Mar 17.
9
Efficient storage of high throughput DNA sequencing data using reference-based compression.
Genome Res. 2011 May;21(5):734-40. doi: 10.1101/gr.114819.110. Epub 2011 Jan 18.
10
MIPS: curated databases and comprehensive secondary data resources in 2010.
Nucleic Acids Res. 2011 Jan;39(Database issue):D220-4. doi: 10.1093/nar/gkq1157. Epub 2010 Nov 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验