Suppr超能文献

FragBag 是一种准确表示蛋白质结构的方法,它可以快速准确地从整个 PDB 中检索结构邻居。

FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.

机构信息

Department of Computer Science, University of Haifa, Mount Carmel, Haifa 31905, Israel.

出版信息

Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3481-6. doi: 10.1073/pnas.0914097107. Epub 2010 Feb 3.

Abstract

Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.

摘要

快速识别蛋白质结构与指定查询结构在整个蛋白质数据库 (PDB) 中的相似性是结构和功能预测的基础。我们提出了 FragBag:一种快速准确的蛋白质结构比较方法。我们通过收集其重叠的短连续骨架片段来描述蛋白质结构,并使用片段库对该集合进行离散化。然后,我们简洁地将蛋白质表示为“片段袋”——一个计数每个片段出现次数的向量,并通过比较它们的向量来测量两个结构之间的相似性。我们的表示有两个额外的好处:(i) 它可用于构建倒排索引,以实现整个 PDB 的快速结构搜索引擎,(ii) 可以将结构指定为子结构的集合,而无需将它们组合成单个结构;这对于结构预测很有价值,因为蛋白质的某些部分有可靠的预测。我们使用接收者操作特征曲线分析来量化 FragBag 在识别超过 2900 个结构的数据集的邻居候选集方面的成功。黄金标准是由六个最先进的结构比对器找到的邻居集。我们最好的 FragBag 库比其他三种过滤方法:SGM、PRIDE 和 Zotenko 等人的方法找到更准确的候选集。更有趣的是,FragBag 的性能与计算成本高昂但非常可靠的结构比对器 STRUCTAL 和 CE 相当。

相似文献

4
Using Dali for structural comparison of proteins.使用Dali进行蛋白质的结构比较。
Curr Protoc Bioinformatics. 2006 Jul;Chapter 5:Unit 5.5. doi: 10.1002/0471250953.bi0505s14.
5
Recognizing the fold of a protein structure.识别蛋白质结构的折叠。
Bioinformatics. 2003 Sep 22;19(14):1748-59. doi: 10.1093/bioinformatics/btg240.
10
Flexible structural protein alignment by a sequence of local transformations.通过一系列局部变换进行灵活的结构蛋白比对。
Bioinformatics. 2009 Jul 1;25(13):1625-31. doi: 10.1093/bioinformatics/btp296. Epub 2009 May 5.

引用本文的文献

5
Beyond sequence: Structure-based machine learning.超越序列:基于结构的机器学习。
Comput Struct Biotechnol J. 2022 Dec 29;21:630-643. doi: 10.1016/j.csbj.2022.12.039. eCollection 2023.
7
The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言:自然语言处理、机器学习与蛋白质序列
Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.
10
A Structure-Informed Atlas of Human-Virus Interactions.一种基于结构信息的人类-病毒相互作用图谱。
Cell. 2019 Sep 5;178(6):1526-1541.e16. doi: 10.1016/j.cell.2019.08.005. Epub 2019 Aug 29.

本文引用的文献

2
Progress and challenges in protein structure prediction.蛋白质结构预测的进展与挑战
Curr Opin Struct Biol. 2008 Jun;18(3):342-8. doi: 10.1016/j.sbi.2008.02.004. Epub 2008 Apr 22.
5
Rapid retrieval of protein structures from databases.从数据库中快速检索蛋白质结构。
Drug Discov Today. 2007 Sep;12(17-18):732-9. doi: 10.1016/j.drudis.2007.07.014. Epub 2007 Aug 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验