• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在线性时间内搜索蛋白质三维结构。

Searching protein 3-D structures in linear time.

作者信息

Shibuya Tetsuo

机构信息

Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan.

出版信息

J Comput Biol. 2010 Mar;17(3):203-19. doi: 10.1089/cmb.2009.0148.

DOI:10.1089/cmb.2009.0148
PMID:20377441
Abstract

One of the most important issues in the post-genomic molecular biology is the analysis of protein three-dimensional (3-D) structures, and searching over the 3-D structure databases of them is becoming more and more important. The root mean square deviation (RMSD) is the most popular similarity measure for comparing two molecular structures. In this article, we propose new theoretically and practically fast algorithms for the basic problem of finding all the substructures of structures in a structure database of chain molecules (such as proteins), whose RMSDs to the query are within a given constant threshold. The best-known worst-case time complexity for the problem is O(N log m), where N is the database size and m is the query size. The previous best-known expected time complexity for the problem is also O(N log m). We also propose a new breakthrough linear-expected-time algorithm. It is not only a theoretically significant improvement over previous algorithms, but also a practically faster algorithm, according to computational experiments. Our experiments over the whole Protein Data Bank (PDB) database show that our algorithm is 3.6-28 times faster than previously known algorithms, to search for similar substructures whose RMSDs are within 1A to queries of ordinary lengths. We also propose a series of preprocessing algorithms that enable faster queries, though there have been no known indexing algorithm whose query time complexity is better than the above O(N log m) bound. One is an O(N log(2)N)-time and O(N log N)-space preprocessing algorithm with expected query time complexity of O(m + N given complex square root of m). Another is an O(N log N)-time and O(N)-space preprocessing algorithm with expected query time complexity of O(N given complex square root of m + m log (N given m)).(1)

摘要

后基因组分子生物学中最重要的问题之一是蛋白质三维(3-D)结构分析,在其3-D结构数据库中进行搜索变得越来越重要。均方根偏差(RMSD)是比较两个分子结构时最常用的相似性度量。在本文中,我们针对在链状分子(如蛋白质)的结构数据库中查找与查询结构的RMSD在给定常数阈值内的所有子结构这一基本问题,提出了理论上和实践上都快速的新算法。该问题最著名的最坏情况时间复杂度为O(N log m),其中N是数据库大小,m是查询大小。此前该问题最著名的期望时间复杂度也是O(N log m)。我们还提出了一种新的突破性线性期望时间算法。根据计算实验,它不仅在理论上比以前的算法有显著改进,而且在实践中也是更快的算法。我们在整个蛋白质数据库(PDB)上的实验表明,对于搜索RMSD在1埃以内且长度为普通长度的查询的相似子结构,我们的算法比以前已知的算法快3.6到28倍。我们还提出了一系列预处理算法,尽管目前还没有已知的索引算法其查询时间复杂度优于上述O(N log m)界限,但这些算法能实现更快的查询。一种是时间复杂度为O(N log(2)N)且空间复杂度为O(N log N)的预处理算法,期望查询时间复杂度为O(m + N给定m的复平方根)。另一种是时间复杂度为O(N log N)且空间复杂度为O(N)的预处理算法,期望查询时间复杂度为O(N给定m的复平方根 + m log (N给定m))。(1)

相似文献

1
Searching protein 3-D structures in linear time.在线性时间内搜索蛋白质三维结构。
J Comput Biol. 2010 Mar;17(3):203-19. doi: 10.1089/cmb.2009.0148.
2
Searching protein three-dimensional structures in faster than linear time.以快于线性时间的速度搜索蛋白质三维结构。
J Comput Biol. 2010 Apr;17(4):593-602. doi: 10.1089/cmb.2009.0217.
3
Efficient substructure RMSD query algorithms.高效子结构均方根偏差查询算法。
J Comput Biol. 2007 Nov;14(9):1201-7. doi: 10.1089/cmb.2007.0079.
4
LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value.LB3D:一个基于均方根偏差值下限的蛋白质三维子结构搜索程序。
J Comput Biol. 2012 May;19(5):493-503. doi: 10.1089/cmb.2011.0230. Epub 2012 Apr 17.
5
Linear-time protein 3-D structure searching with insertions and deletions.带插入和缺失的线性时间蛋白质三维结构搜索
Algorithms Mol Biol. 2010 Jan 4;5:7. doi: 10.1186/1748-7188-5-7.
6
Fast hinge detection algorithms for flexible protein structures.快速铰链检测算法用于柔性蛋白质结构。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):333-41. doi: 10.1109/TCBB.2008.62.
7
Finding All Longest Common Segments in Protein Structures Efficiently.高效查找蛋白质结构中的所有最长公共片段
IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):644-55. doi: 10.1109/TCBB.2014.2372782.
8
Effective optimization algorithms for fragment-assembly based protein structure prediction.用于基于片段组装的蛋白质结构预测的有效优化算法。
Comput Syst Bioinformatics Conf. 2006:19-29.
9
Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures.使用智能算法和数据结构搜索蛋白质三维结构以实现最佳结构比对。
IEEE Trans Inf Technol Biomed. 2010 Nov;14(6):1378-86. doi: 10.1109/TITB.2010.2079939. Epub 2010 Sep 27.
10
Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.快速精确算法求解最接近字符串和子字符串问题及其在 (L, d)-基序模型中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21.

引用本文的文献

1
Rapid search for tertiary fragments reveals protein sequence-structure relationships.对三级片段的快速搜索揭示了蛋白质序列-结构关系。
Protein Sci. 2015 Apr;24(4):508-24. doi: 10.1002/pro.2610. Epub 2014 Dec 31.
2
Linear-time protein 3-D structure searching with insertions and deletions.带插入和缺失的线性时间蛋白质三维结构搜索
Algorithms Mol Biol. 2010 Jan 4;5:7. doi: 10.1186/1748-7188-5-7.