蛋白质结构分类信息的价值——审视科学文献

The value of protein structure classification information-Surveying the scientific literature.

作者信息

Fox Naomi K, Brenner Steven E, Chandonia John-Marc

机构信息

Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.

Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720.

出版信息

Proteins. 2015 Nov;83(11):2025-38. doi: 10.1002/prot.24915. Epub 2015 Sep 19.

DOI:10.1002/prot.24915

PMID:26313554

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4609302/

Abstract

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.

摘要

蛋白质结构分类数据库（SCOP）以及类、结构、拓扑、同源性数据库（CATH）在过去20多年里一直是蛋白质结构分类的宝贵资源。SCOP（版本1）的开发于2009年6月随着SCOP 1.75的发布而结束。SCOPe（扩展版SCOP）数据库在经典SCOP层次结构的基础上持续发展，新增了超过33000个结构。我们试图评估这两个已有20年历史的资源所产生的影响，并为未来的发展提供指导。为此，我们调研了近期的文章，以了解结构分类数据的使用方式。在2012年至2013年发表的引用SCOP的571篇文章中，有439篇实际使用了该资源的数据。我们发现，使用类型在四个主要类别中分布较为均匀：A）研究蛋白质结构或进化（占文章的27%），B）训练和/或基准测试算法（占文章的28%），C）用SCOP分类扩充非SCOP数据集（占文章的21%），D）研究单个蛋白质/一小部分蛋白质的分类（占文章的22%）。大多数文章描述的是计算研究，不过有11%描述的是纯实验研究，另有9%则两者都有涉及。我们研究了在158篇同时引用了这两个数据库的文章中CATH和SCOP是如何被使用的：虽然有些研究只使用了一个数据集，但大多数研究使用了来自这两个资源的数据。蛋白质结构分类对于各种不同的问题和场景仍然高度相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb3c/5014147/2ba64540f131/PROT-83-2025-g001.jpg

相似文献

The value of protein structure classification information-Surveying the scientific literature.蛋白质结构分类信息的价值——审视科学文献

Proteins. 2015 Nov;83(11):2025-38. doi: 10.1002/prot.24915. Epub 2015 Sep 19.

SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.SCOPe：蛋白质结构分类中的人工整理与伪迹去除——扩展数据库

J Mol Biol. 2017 Feb 3;429(3):348-355. doi: 10.1016/j.jmb.2016.11.023. Epub 2016 Nov 30.

Automatic classification of protein structures using low-dimensional structure space mappings.利用低维结构空间映射对蛋白质结构进行自动分类。

BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-15-S2-S1. Epub 2014 Jan 24.

Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis.SCOP与CATH的系统比较：蛋白质结构分析的新金标准。

BMC Struct Biol. 2009 Apr 17;9:23. doi: 10.1186/1472-6807-9-23.

Automated assignment of SCOP and CATH protein structure classifications from FSSP scores.基于FSSP评分对SCOP和CATH蛋白质结构分类进行自动分配。

Proteins. 2002 Mar 1;46(4):405-15. doi: 10.1002/prot.1176.

SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.SCOPe：蛋白质结构分类——扩展版，整合了 SCOP 和 ASTRAL 数据以及新结构的分类。

Nucleic Acids Res. 2014 Jan;42(Database issue):D304-9. doi: 10.1093/nar/gkt1240. Epub 2013 Dec 3.

A Protein Classification Benchmark collection for machine learning.一个用于机器学习的蛋白质分类基准数据集。

Nucleic Acids Res. 2007 Jan;35(Database issue):D232-6. doi: 10.1093/nar/gkl812. Epub 2006 Nov 16.

Variable predictive model based classification algorithm for effective separation of protein structural classes.基于可变预测模型的分类算法用于有效分离蛋白质结构类别。

Comput Biol Chem. 2008 Aug;32(4):302-6. doi: 10.1016/j.compbiolchem.2008.03.009. Epub 2008 Apr 1.

A comparison of SCOP and CATH with respect to domain-domain interactions.SCOP与CATH在结构域间相互作用方面的比较。

Proteins. 2008 Jan 1;70(1):54-62. doi: 10.1002/prot.21496.

Comparison of sequence and structure-based datasets for nonredundant structural data mining.用于非冗余结构数据挖掘的基于序列和结构的数据集比较。

Proteins. 2005 Sep 1;60(4):577-83. doi: 10.1002/prot.20505.

引用本文的文献

Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence.利用大数据和人工智能阐明原核蛋白的功能作用。

FEMS Microbiol Rev. 2023 Jan 16;47(1). doi: 10.1093/femsre/fuad003.

SCOPe: improvements to the structural classification of proteins - extended database to facilitate variant interpretation and machine learning.SCOPe：蛋白质结构分类的改进——扩展数据库以促进变体解释和机器学习。

Nucleic Acids Res. 2022 Jan 7;50(D1):D553-D559. doi: 10.1093/nar/gkab1054.

Sequence and Structure Properties Uncover the Natural Classification of Protein Complexes Formed by Intrinsically Disordered Proteins via Mutual Synergistic Folding.序列和结构特性揭示了通过相互协同折叠形成的固有无序蛋白质的蛋白质复合物的自然分类。

Int J Mol Sci. 2019 Nov 1;20(21):5460. doi: 10.3390/ijms20215460.

SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database.SCOPe：蛋白质结构分类扩展数据库中大分子结构的分类。

Nucleic Acids Res. 2019 Jan 8;47(D1):D475-D481. doi: 10.1093/nar/gky1134.

Organic Particles: Heterogeneous Hubs for Microbial Interactions in Aquatic Ecosystems.有机颗粒：水生生态系统中微生物相互作用的异质中心

Front Microbiol. 2018 Oct 26;9:2569. doi: 10.3389/fmicb.2018.02569. eCollection 2018.

BoBER: web interface to the base of bioisosterically exchangeable replacements.BoBER：生物电子等排体可交换替代物库的网络界面。

J Cheminform. 2017 Dec 12;9(1):62. doi: 10.1186/s13321-017-0251-x.

SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.SCOPe：蛋白质结构分类中的人工整理与伪迹去除——扩展数据库

J Mol Biol. 2017 Feb 3;429(3):348-355. doi: 10.1016/j.jmb.2016.11.023. Epub 2016 Nov 30.

Impact of structure space continuity on protein fold classification.结构空间连续性对蛋白质折叠分类的影响。

Sci Rep. 2016 Mar 23;6:23263. doi: 10.1038/srep23263.

本文引用的文献

CATH: comprehensive structural and functional annotations for genome sequences.CATH：基因组序列的全面结构和功能注释。

Nucleic Acids Res. 2015 Jan;43(Database issue):D376-81. doi: 10.1093/nar/gku947. Epub 2014 Oct 27.

Nucleic Acids Res. 2014 Jan;42(Database issue):D304-9. doi: 10.1093/nar/gkt1240. Epub 2013 Dec 3.

SCOP2 prototype: a new approach to protein structure mining.SCOP2 原型：一种新的蛋白质结构挖掘方法。

Nucleic Acids Res. 2014 Jan;42(Database issue):D310-4. doi: 10.1093/nar/gkt1242. Epub 2013 Nov 29.

Exploring fold space preferences of new-born and ancient protein superfamilies.探索新生成和古老蛋白质超家族的折叠空间偏好。

PLoS Comput Biol. 2013;9(11):e1003325. doi: 10.1371/journal.pcbi.1003325. Epub 2013 Nov 14.

Rebelling for a reason: protein structural "outliers".有因有果的反抗：蛋白质结构的“异类”。

PLoS One. 2013 Sep 20;8(9):e74416. doi: 10.1371/journal.pone.0074416. eCollection 2013.

ThreaDom: extracting protein domain boundary information from multiple threading alignments.ThreaDom：从多重序列比对中提取蛋白质结构域边界信息。

Bioinformatics. 2013 Jul 1;29(13):i247-56. doi: 10.1093/bioinformatics/btt209.

High-quality protein backbone reconstruction from alpha carbons using Gaussian mixture models.使用高斯混合模型从α碳原子重建高质量的蛋白质骨架。

J Comput Chem. 2013 Aug 15;34(22):1881-9. doi: 10.1002/jcc.23330. Epub 2013 May 24.

The four-transmembrane protein IP39 of Euglena forms strands by a trimeric unit repeat.眼虫的四跨膜蛋白 IP39 通过三聚体单元重复形成链。

Nat Commun. 2013;4:1766. doi: 10.1038/ncomms2731.

Capturing protein sequence-structure specificity using computational sequence design.利用计算序列设计捕获蛋白质序列-结构特异性。

Proteins. 2013 Sep;81(9):1556-70. doi: 10.1002/prot.24307. Epub 2013 Jun 20.

N-terminal domains in two-domain proteins are biased to be shorter and predicted to fold faster than their C-terminal counterparts.两域蛋白中的 N 端结构域比 C 端结构域更倾向于短，并且预测它们折叠速度更快。

Cell Rep. 2013 Apr 25;3(4):1051-6. doi: 10.1016/j.celrep.2013.03.032. Epub 2013 Apr 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质结构分类信息的价值——审视科学文献

The value of protein structure classification information-Surveying the scientific literature.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献