• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

残基簇类别:一种用于高效结构和功能分类的统一蛋白质表示法。

Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification.

作者信息

Fontove Fernando, Del Rio Gabriel

机构信息

C3 Consensus, Miguel Hidalgo, CDMX, Mexico City 11510, Mexico.

Department of Biochemistry and Structural Biology, Instituto de Fisiología Celular, UNAM, Mexico City 04510, Mexico.

出版信息

Entropy (Basel). 2020 Apr 20;22(4):472. doi: 10.3390/e22040472.

DOI:10.3390/e22040472
PMID:33286246
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516957/
Abstract

Proteins are characterized by their structures and functions, and these two fundamental aspects of proteins are assumed to be related. To model such a relationship, a single representation to model both protein structure and function would be convenient, yet so far, the most effective models for protein structure or function classification do not rely on the same protein representation. Here we provide a computationally efficient implementation for large datasets to calculate residue cluster classes (RCCs) from protein three-dimensional structures and show that such representations enable a random forest algorithm to effectively learn the structural and functional classifications of proteins, according to the CATH and Gene Ontology criteria, respectively. RCCs are derived from residue contact maps built from different distance criteria, and we show that 7 or 8 Å with or without amino acid side-chain atoms rendered the best classification models. The potential use of a unified representation of proteins is discussed and possible future areas for improvement and exploration are presented.

摘要

蛋白质以其结构和功能为特征,并且假定蛋白质的这两个基本方面是相关的。为了对这种关系进行建模,用单一表示来同时对蛋白质结构和功能进行建模会很方便,然而到目前为止,用于蛋白质结构或功能分类的最有效模型并不依赖于相同的蛋白质表示。在这里,我们为大型数据集提供了一种计算高效的实现方法,用于从蛋白质三维结构计算残基簇类别(RCC),并表明这种表示能够使随机森林算法分别根据CATH和基因本体标准有效地学习蛋白质的结构和功能分类。RCC是从基于不同距离标准构建的残基接触图中推导出来的,并且我们表明,带有或不带有氨基酸侧链原子的7或8埃给出了最佳分类模型。讨论了蛋白质统一表示的潜在用途,并提出了未来可能的改进和探索领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/76a4c03990b5/entropy-22-00472-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/c5d30d6ed220/entropy-22-00472-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/805217acc5d8/entropy-22-00472-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/4ef2c1679fa2/entropy-22-00472-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/29e5a044d6d1/entropy-22-00472-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/ecc67eab4b1f/entropy-22-00472-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/7c3c62ffc633/entropy-22-00472-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/76a4c03990b5/entropy-22-00472-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/c5d30d6ed220/entropy-22-00472-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/805217acc5d8/entropy-22-00472-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/4ef2c1679fa2/entropy-22-00472-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/29e5a044d6d1/entropy-22-00472-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/ecc67eab4b1f/entropy-22-00472-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/7c3c62ffc633/entropy-22-00472-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab5/7516957/76a4c03990b5/entropy-22-00472-g007.jpg

相似文献

1
Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification.残基簇类别:一种用于高效结构和功能分类的统一蛋白质表示法。
Entropy (Basel). 2020 Apr 20;22(4):472. doi: 10.3390/e22040472.
2
Maps of protein structure space reveal a fundamental relationship between protein structure and function.蛋白质结构空间图谱揭示了蛋白质结构与功能之间的基本关系。
Proc Natl Acad Sci U S A. 2011 Jul 26;108(30):12301-6. doi: 10.1073/pnas.1102727108. Epub 2011 Jul 7.
3
Can molecular dynamics simulations help in discriminating correct from erroneous protein 3D models?分子动力学模拟能否有助于区分正确与错误的蛋白质三维模型?
BMC Bioinformatics. 2008 Jan 7;9:6. doi: 10.1186/1471-2105-9-6.
4
Evaluation of structural similarity based on reduced dimensionality representations of protein structure.基于蛋白质结构降维表示的结构相似性评估。
Protein Eng Des Sel. 2004 May;17(5):425-32. doi: 10.1093/protein/gzh049. Epub 2004 Jun 8.
5
An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance.一种用于蛋白质序列和结构分析与建模的综合方法。I. 蛋白质结构比对及蛋白质结构距离的定量度量。
J Mol Biol. 2000 Aug 18;301(3):665-78. doi: 10.1006/jmbi.2000.3973.
6
A new protein structure representation for efficient protein function prediction.一种用于高效蛋白质功能预测的新蛋白质结构表示法。
J Comput Biol. 2014 Dec;21(12):936-46. doi: 10.1089/cmb.2014.0137.
7
3D representations of amino acids-applications to protein sequence comparison and classification.氨基酸的 3D 表示——在蛋白质序列比较和分类中的应用。
Comput Struct Biotechnol J. 2014 Sep 6;11(18):47-58. doi: 10.1016/j.csbj.2014.09.001. eCollection 2014 Aug.
8
Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems.基于离散动力系统的蝴蝶算法:一种高维数据分类新方法。
Bioinformatics. 2014 Mar 1;30(5):712-8. doi: 10.1093/bioinformatics/btt602. Epub 2013 Oct 21.
9
CATH--a hierarchic classification of protein domain structures.CATH——蛋白质结构域结构的层次分类。
Structure. 1997 Aug 15;5(8):1093-108. doi: 10.1016/s0969-2126(97)00260-8.
10
Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm.基于一级和二级结构特征,使用随机森林算法预测相似度为 40%的蛋白质序列的结构类别。
Comput Biol Chem. 2020 Feb;84:107164. doi: 10.1016/j.compbiolchem.2019.107164. Epub 2019 Nov 15.

引用本文的文献

1
BioS2Net: Holistic Structural and Sequential Analysis of Biomolecules Using a Deep Neural Network.BioS2Net:使用深度神经网络对生物分子进行整体结构和序列分析。
Int J Mol Sci. 2022 Mar 9;23(6):2966. doi: 10.3390/ijms23062966.
2
Saturation Mutagenesis of the Transmembrane Region of HokC in Reveals Its High Tolerance to Mutations.霍克C跨膜区的饱和诱变揭示其对突变的高耐受性。
Int J Mol Sci. 2021 Sep 26;22(19):10359. doi: 10.3390/ijms221910359.
3
Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes.残基簇类高效模拟蛋白质-蛋白质相互作用。

本文引用的文献

1
Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。
Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.
2
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.CAFA 挑战赛报告称,通过实验筛选,提高了数百个基因的蛋白质功能预测和新的功能注释。
Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.
3
DeepGOPlus: improved protein function prediction from sequence.
Int J Mol Sci. 2020 Jul 6;21(13):4787. doi: 10.3390/ijms21134787.
DeepGOPlus:从序列中改进蛋白质功能预测。
Bioinformatics. 2020 Jan 15;36(2):422-429. doi: 10.1093/bioinformatics/btz595.
4
AlphaFold at CASP13.AlphaFold 在 CASP13 中的应用。
Bioinformatics. 2019 Nov 1;35(22):4862-4865. doi: 10.1093/bioinformatics/btz422.
5
RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy.RCSB 蛋白质数据库:生物大分子结构,推动基础生物学、生物医学、生物技术和能源领域的研究和教育。
Nucleic Acids Res. 2019 Jan 8;47(D1):D464-D474. doi: 10.1093/nar/gky1004.
6
Phylogenetic analysis of protein sequences based on a novel k-mer natural vector method.基于新型 k-mer 自然向量方法的蛋白质序列系统发育分析。
Genomics. 2019 Dec;111(6):1298-1305. doi: 10.1016/j.ygeno.2018.08.010. Epub 2018 Sep 5.
7
An expanded evaluation of protein function prediction methods shows an improvement in accuracy.对蛋白质功能预测方法的扩展评估显示准确性有所提高。
Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.
8
PDBparam: Online Resource for Computing Structural Parameters of Proteins.PDBparam:用于计算蛋白质结构参数的在线资源。
Bioinform Biol Insights. 2016 Jun 14;10:73-80. doi: 10.4137/BBI.S38423. eCollection 2016.
9
Machine Learnable Fold Space Representation based on Residue Cluster Classes.基于残基聚类类别的机器学习可折叠空间表示
Comput Biol Chem. 2015 Dec;59 Pt A:1-7. doi: 10.1016/j.compbiolchem.2015.07.010. Epub 2015 Jul 30.
10
SIFTER search: a web server for accurate phylogeny-based protein function prediction.SIFTER搜索:一个用于基于系统发育的蛋白质功能准确预测的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W141-7. doi: 10.1093/nar/gkv461. Epub 2015 May 15.