RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA.
RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.
PLoS Comput Biol. 2020 Dec 7;16(12):e1008502. doi: 10.1371/journal.pcbi.1008502. eCollection 2020 Dec.
Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at motif.rcsb.org) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids.
蛋白质的生化和生物学功能是整条多肽链的整体折叠以及通常由少数氨基酸组成的结构模体的产物,这些结构模体构成了催化中心或结合位点,它们在氨基酸序列上可能彼此远离。检测这些结构模体可以为以前未表征的蛋白质的功能提供有价值的见解。从技术上讲,由于蛋白质数据库 (PDB) 档案的大小,这仍然是一个极具挑战性的问题。现有的方法依赖于序列相似性聚类,并且计算速度可能较慢。我们开发了一种新方法,该方法使用倒排索引策略,能够以无与伦比的速度分析超过 170,000 个 PDB 结构。倒排索引方法的效率取决于识别包含查询模体的少数结构并忽略大多数不相关的结构。我们的方法(在 motif.rcsb.org 上实现)能够实时检索和叠加结构模体,这些模体可以从参考结构中提取,也可以由用户上传。在此,我们描述了该方法,并介绍了五个案例研究,这些案例研究说明了其分析蛋白质和核酸三维结构的功效和速度。