朝向蛋白功能的区域特异性传播。

Towards region-specific propagation of protein functions.

机构信息

Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA.

Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.

出版信息

Bioinformatics. 2019 May 15;35(10):1737-1744. doi: 10.1093/bioinformatics/bty834.

DOI:10.1093/bioinformatics/bty834

PMID:30304483

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6513163/

Abstract

MOTIVATION

Due to the nature of experimental annotation, most protein function prediction methods operate at the protein-level, where functions are assigned to full-length proteins based on overall similarities. However, most proteins function by interacting with other proteins or molecules, and many functional associations should be limited to specific regions rather than the entire protein length. Most domain-centric function prediction methods depend on accurate domain family assignments to infer relationships between domains and functions, with regions that are unassigned to a known domain-family left out of functional evaluation. Given the abundance of residue-level annotations currently available, we present a function prediction methodology that automatically infers function labels of specific protein regions using protein-level annotations and multiple types of region-specific features.

RESULTS

We apply this method to local features obtained from InterPro, UniProtKB and amino acid sequences and show that this method improves both the accuracy and region-specificity of protein function transfer and prediction. We compare region-level predictive performance of our method against that of a whole-protein baseline method using proteins with structurally verified binding sites and also compare protein-level temporal holdout predictive performances to expand the variety and specificity of GO terms we could evaluate. Our results can also serve as a starting point to categorize GO terms into region-specific and whole-protein terms and select prediction methods for different classes of GO terms.

AVAILABILITY AND IMPLEMENTATION

The code and features are freely available at: https://github.com/ek1203/rsfp.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

由于实验注释的性质，大多数蛋白质功能预测方法都在蛋白质水平上进行操作，即根据整体相似性将功能分配给全长蛋白质。然而，大多数蛋白质通过与其他蛋白质或分子相互作用而发挥功能，并且许多功能关联应该仅限于特定区域，而不是整个蛋白质长度。大多数基于结构域的功能预测方法都依赖于准确的结构域家族分配来推断结构域和功能之间的关系，而那些未分配给已知结构域家族的区域则被排除在功能评估之外。鉴于目前可用的残基级注释的丰富性，我们提出了一种功能预测方法，该方法使用蛋白质水平的注释和多种类型的区域特定特征自动推断特定蛋白质区域的功能标签。

结果

我们将这种方法应用于从 InterPro、UniProtKB 和氨基酸序列中获得的局部特征，并表明该方法提高了蛋白质功能转移和预测的准确性和区域特异性。我们使用具有结构验证结合位点的蛋白质比较了区域级预测性能，比较了蛋白质水平的时间保留预测性能，以扩展我们可以评估的 GO 术语的多样性和特异性。我们的结果还可以作为将 GO 术语分类为区域特定和全蛋白质术语的起点，并为不同类别的 GO 术语选择预测方法。

可用性和实现

代码和特征可在以下网址免费获得：https://github.com/ek1203/rsfp。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fac0/6513163/fb8830352912/bty834f1.jpg

相似文献

Towards region-specific propagation of protein functions.朝向蛋白功能的区域特异性传播。

Bioinformatics. 2019 May 15;35(10):1737-1744. doi: 10.1093/bioinformatics/bty834.

Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

Improving protein function prediction using protein sequence and GO-term similarities.利用蛋白质序列和 GO 术语相似性提高蛋白质功能预测。

Bioinformatics. 2019 Apr 1;35(7):1116-1124. doi: 10.1093/bioinformatics/bty751.

Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.使用最大熵方法对 GO 本体论和 InterPro 注释进行共复合物蛋白成员评估。

Bioinformatics. 2018 Jun 1;34(11):1884-1892. doi: 10.1093/bioinformatics/btx803.

Assigning protein function from domain-function associations using DomFun.基于域-功能关联来分配蛋白质功能，使用 DomFun。

BMC Bioinformatics. 2022 Jan 15;23(1):43. doi: 10.1186/s12859-022-04565-6.

deepNF: deep network fusion for protein function prediction.深度网络融合的蛋白质功能预测。

Bioinformatics. 2018 Nov 15;34(22):3873-3881. doi: 10.1093/bioinformatics/bty440.

GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank.GOLabeler：通过学习排序提高基于序列的大规模蛋白质功能预测。

Bioinformatics. 2018 Jul 15;34(14):2465-2473. doi: 10.1093/bioinformatics/bty130.

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习

Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

NegGOA: negative GO annotations selection using ontology structure.NegGOA：基于本体结构的负 GO 注释选择。

Bioinformatics. 2016 Oct 1;32(19):2996-3004. doi: 10.1093/bioinformatics/btw366. Epub 2016 Jun 17.

NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity.NetQuilt：基于深度多物种网络的蛋白质功能预测，利用同源性信息网络相似性

Bioinformatics. 2021 Aug 25;37(16):2414-2422. doi: 10.1093/bioinformatics/btab098.

引用本文的文献

Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review.智能蛋白质设计与分子特征技术：全面综述。

Molecules. 2023 Nov 30;28(23):7865. doi: 10.3390/molecules28237865.

BioLiP2: an updated structure database for biologically relevant ligand-protein interactions.BioLiP2：一个更新的生物相关配体-蛋白质相互作用结构数据库。

Nucleic Acids Res. 2024 Jan 5;52(D1):D404-D412. doi: 10.1093/nar/gkad630.

I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction.I-TASSER-MTD：一个基于深度学习的多领域蛋白质结构和功能预测平台。

Nat Protoc. 2022 Oct;17(10):2326-2353. doi: 10.1038/s41596-022-00728-0. Epub 2022 Aug 5.

Bioinformatics. 2021 Aug 25;37(16):2414-2422. doi: 10.1093/bioinformatics/btab098.

本文引用的文献

deepNF: deep network fusion for protein function prediction.深度网络融合的蛋白质功能预测。

Bioinformatics. 2018 Nov 15;34(22):3873-3881. doi: 10.1093/bioinformatics/bty440.

MobiDB-lite: fast and highly specific consensus prediction of intrinsic disorder in proteins.MobiDB-lite：蛋白质内在无序的快速且高度特异的一致性预测

Bioinformatics. 2017 May 1;33(9):1402-1404. doi: 10.1093/bioinformatics/btx015.

The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design.用于大分子建模与设计的罗塞塔全原子能量函数。

J Chem Theory Comput. 2017 Jun 13;13(6):3031-3048. doi: 10.1021/acs.jctc.7b00125. Epub 2017 May 12.

A review on machine learning principles for multi-view biological data integration.机器学习原理在多视图生物数据集成中的研究综述。

Brief Bioinform. 2018 Mar 1;19(2):325-340. doi: 10.1093/bib/bbw113.

InterPro in 2017-beyond protein family and domain annotations.2017年的InterPro——超越蛋白质家族和结构域注释

Nucleic Acids Res. 2017 Jan 4;45(D1):D190-D199. doi: 10.1093/nar/gkw1107. Epub 2016 Nov 29.

UniProt: the universal protein knowledgebase.通用蛋白质知识库：UniProt

Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.对蛋白质功能预测方法的扩展评估显示准确性有所提高。

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

FFPred 3: feature-based function prediction for all Gene Ontology domains.FFPred 3：基于特征的所有 GO 域功能预测。

Sci Rep. 2016 Aug 26;6:31865. doi: 10.1038/srep31865.

GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations.InterPro中的基因本体注释：为何在不断变化的注释海洋中稳定性并不意味着准确性。

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw027. Print 2016.

Computational protein function predictions.计算蛋白质功能预测

Methods. 2016 Jan 15;93:1-2. doi: 10.1016/j.ymeth.2016.01.001.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

朝向蛋白功能的区域特异性传播。

Towards region-specific propagation of protein functions.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献