• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FragBag 是一种准确表示蛋白质结构的方法,它可以快速准确地从整个 PDB 中检索结构邻居。

FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.

机构信息

Department of Computer Science, University of Haifa, Mount Carmel, Haifa 31905, Israel.

出版信息

Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3481-6. doi: 10.1073/pnas.0914097107. Epub 2010 Feb 3.

DOI:10.1073/pnas.0914097107
PMID:20133727
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2840415/
Abstract

Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.

摘要

快速识别蛋白质结构与指定查询结构在整个蛋白质数据库 (PDB) 中的相似性是结构和功能预测的基础。我们提出了 FragBag:一种快速准确的蛋白质结构比较方法。我们通过收集其重叠的短连续骨架片段来描述蛋白质结构,并使用片段库对该集合进行离散化。然后,我们简洁地将蛋白质表示为“片段袋”——一个计数每个片段出现次数的向量,并通过比较它们的向量来测量两个结构之间的相似性。我们的表示有两个额外的好处:(i) 它可用于构建倒排索引,以实现整个 PDB 的快速结构搜索引擎,(ii) 可以将结构指定为子结构的集合,而无需将它们组合成单个结构;这对于结构预测很有价值,因为蛋白质的某些部分有可靠的预测。我们使用接收者操作特征曲线分析来量化 FragBag 在识别超过 2900 个结构的数据集的邻居候选集方面的成功。黄金标准是由六个最先进的结构比对器找到的邻居集。我们最好的 FragBag 库比其他三种过滤方法:SGM、PRIDE 和 Zotenko 等人的方法找到更准确的候选集。更有趣的是,FragBag 的性能与计算成本高昂但非常可靠的结构比对器 STRUCTAL 和 CE 相当。

相似文献

1
FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.FragBag 是一种准确表示蛋白质结构的方法,它可以快速准确地从整个 PDB 中检索结构邻居。
Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3481-6. doi: 10.1073/pnas.0914097107. Epub 2010 Feb 3.
2
Learning structural motif representations for efficient protein structure search.学习结构基元表示以实现高效的蛋白质结构搜索。
Bioinformatics. 2018 Sep 1;34(17):i773-i780. doi: 10.1093/bioinformatics/bty585.
3
PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics.PDB-UF:结构基因组学未注释蛋白质结构的预测酶功能数据库。
BMC Bioinformatics. 2006 Feb 6;7:53. doi: 10.1186/1471-2105-7-53.
4
Using Dali for structural comparison of proteins.使用Dali进行蛋白质的结构比较。
Curr Protoc Bioinformatics. 2006 Jul;Chapter 5:Unit 5.5. doi: 10.1002/0471250953.bi0505s14.
5
Recognizing the fold of a protein structure.识别蛋白质结构的折叠。
Bioinformatics. 2003 Sep 22;19(14):1748-59. doi: 10.1093/bioinformatics/btg240.
6
Retrieving backbone string neighbors provides insights into structural modeling of membrane proteins.提取骨干字符串邻居为膜蛋白的结构建模提供了深入了解。
Mol Cell Proteomics. 2012 Jul;11(7):M111.016808. doi: 10.1074/mcp.M111.016808. Epub 2012 Mar 13.
7
Highly accurate and consistent method for prediction of helix and strand content from primary protein sequences.一种从蛋白质一级序列预测螺旋和链含量的高度准确且一致的方法。
Artif Intell Med. 2005 Sep-Oct;35(1-2):19-35. doi: 10.1016/j.artmed.2005.02.006.
8
Index-based similarity search for protein structure databases.基于索引的蛋白质结构数据库相似性搜索。
J Bioinform Comput Biol. 2004 Mar;2(1):99-126. doi: 10.1142/s0219720004000491.
9
Protein structural similarity search by Ramachandran codes.通过拉马钱德兰编码进行蛋白质结构相似性搜索。
BMC Bioinformatics. 2007 Aug 23;8:307. doi: 10.1186/1471-2105-8-307.
10
Flexible structural protein alignment by a sequence of local transformations.通过一系列局部变换进行灵活的结构蛋白比对。
Bioinformatics. 2009 Jul 1;25(13):1625-31. doi: 10.1093/bioinformatics/btp296. Epub 2009 May 5.

引用本文的文献

1
Unsupervised learning reveals landscape of local structural motifs across protein classes.无监督学习揭示了跨蛋白质类别的局部结构基序格局。
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf377.
2
Deep generative models of protein structure uncover distant relationships across a continuous fold space.深度生成模型揭示了蛋白质结构在连续折叠空间中的遥远关系。
Nat Commun. 2024 Sep 16;15(1):8094. doi: 10.1038/s41467-024-52020-2.
3
Persistent homology reveals strong phylogenetic signal in 3D protein structures.持久同调揭示了三维蛋白质结构中强大的系统发育信号。
PNAS Nexus. 2024 Apr 17;3(4):pgae158. doi: 10.1093/pnasnexus/pgae158. eCollection 2024 Apr.
4
Sequence-structure-function relationships in the microbial protein universe.微生物蛋白质宇宙中的序列-结构-功能关系。
Nat Commun. 2023 Apr 26;14(1):2351. doi: 10.1038/s41467-023-37896-w.
5
Beyond sequence: Structure-based machine learning.超越序列:基于结构的机器学习。
Comput Struct Biotechnol J. 2022 Dec 29;21:630-643. doi: 10.1016/j.csbj.2022.12.039. eCollection 2023.
6
Fast protein structure comparison through effective representation learning with contrastive graph neural networks.通过对比图神经网络的有效表示学习进行快速蛋白质结构比较。
PLoS Comput Biol. 2022 Mar 24;18(3):e1009986. doi: 10.1371/journal.pcbi.1009986. eCollection 2022 Mar.
7
The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言:自然语言处理、机器学习与蛋白质序列
Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.
8
Quantifying steric hindrance and topological obstruction to protein structure superposition.量化蛋白质结构叠加的空间位阻和拓扑阻碍。
Algorithms Mol Biol. 2021 Feb 27;16(1):1. doi: 10.1186/s13015-020-00180-3.
9
Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection.基于自动编码器的特征提取和监督学习在蛋白质诱饵选择中的评估。
Molecules. 2020 Mar 4;25(5):1146. doi: 10.3390/molecules25051146.
10
A Structure-Informed Atlas of Human-Virus Interactions.一种基于结构信息的人类-病毒相互作用图谱。
Cell. 2019 Sep 5;178(6):1526-1541.e16. doi: 10.1016/j.cell.2019.08.005. Epub 2019 Aug 29.

本文引用的文献

1
Is protein classification necessary? Toward alternative approaches to function annotation.蛋白质分类有必要吗?迈向功能注释的替代方法。
Curr Opin Struct Biol. 2009 Jun;19(3):363-8. doi: 10.1016/j.sbi.2009.02.001. Epub 2009 Mar 5.
2
Progress and challenges in protein structure prediction.蛋白质结构预测的进展与挑战
Curr Opin Struct Biol. 2008 Jun;18(3):342-8. doi: 10.1016/j.sbi.2008.02.004. Epub 2008 Apr 22.
3
Sequence-similar, structure-dissimilar protein pairs in the PDB.蛋白质数据银行(PDB)中序列相似但结构不同的蛋白质对。
Proteins. 2008 May 1;71(2):891-902. doi: 10.1002/prot.21770.
4
Critical assessment of methods of protein structure prediction-Round VII.蛋白质结构预测方法的批判性评估——第七轮。
Proteins. 2007;69 Suppl 8(S8):3-9. doi: 10.1002/prot.21767.
5
Rapid retrieval of protein structures from databases.从数据库中快速检索蛋白质结构。
Drug Discov Today. 2007 Sep;12(17-18):732-9. doi: 10.1016/j.drudis.2007.07.014. Epub 2007 Aug 28.
6
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
7
Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database.基于卡帕-阿尔法图推导的结构字母表和类BLOSUM替换矩阵,用于快速搜索蛋白质结构数据库。
Genome Biol. 2007;8(3):R31. doi: 10.1186/gb-2007-8-3-r31.
8
Using an alignment of fragment strings for comparing protein structures.使用片段字符串比对来比较蛋白质结构。
Bioinformatics. 2007 Jan 15;23(2):e219-24. doi: 10.1093/bioinformatics/btl310.
9
Secondary structure spatial conformation footprint: a novel method for fast protein structure comparison and classification.二级结构空间构象足迹:一种快速蛋白质结构比较与分类的新方法。
BMC Struct Biol. 2006 Jun 8;6:12. doi: 10.1186/1472-6807-6-12.
10
Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction.蛋白质结构比较:对“折叠空间”性质以及结构与功能预测的启示
Curr Opin Struct Biol. 2006 Jun;16(3):393-8. doi: 10.1016/j.sbi.2006.04.007. Epub 2006 May 4.