• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FP-Zernike:用于快速结构检索的开源结构数据库构建工具包。

FP-Zernike: An Open-source Structural Database Construction Toolkit for Fast Structure Retrieval.

机构信息

Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China.

BioMap Research, Menlo Park, CA 94025, USA.

出版信息

Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzae007.

DOI:10.1093/gpbjnl/qzae007
PMID:38894604
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11423855/
Abstract

The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-the-art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-the-art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.

摘要

AlphaFold2 的发布引发了蛋白质模型数据库的快速扩张。高效的蛋白质结构检索对于结构模型的分析至关重要,而衡量结构之间的相似性是结构检索的关键挑战。尽管现有的结构对齐算法可以解决这个挑战,但它们通常很耗时。目前,最先进的方法是将蛋白质结构转换为三维(3D)Zernike 描述符,并使用欧几里得距离评估相似性。然而,计算 3D Zernike 描述符的方法主要依赖于结构表面,并且主要是基于网络的,因此限制了它们在研究自定义数据集方面的应用。为了克服这个限制,我们开发了 FP-Zernike,这是一个基于特征点的计算不同类型 Zernike 描述符的用户友好工具包。用户只需输入一行命令即可计算自定义数据集中所有结构的 Zernike 描述符。FP-Zernike 在各种基准数据集的检索准确性和二进制分类准确性方面均优于领先方法。此外,我们展示了 FP-Zernike 在描述符数据库构建和 Protein Data Bank(PDB)数据集协议中的应用,以方便对此工具感兴趣的读者在本地部署。我们的演示包含 590685 个结构,在这个规模下,我们的系统仅需 4-9 秒即可完成检索。实验证实它达到了最先进的准确性水平。FP-Zernike 是一个开源工具包,其源代码和相关数据可在 https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1 以及 http://www.structbioinfo.cn/ 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dd7/11423855/b2db948d0aa8/qzae007f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dd7/11423855/1962d244e007/qzae007f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dd7/11423855/f4a45189dcae/qzae007f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dd7/11423855/b2db948d0aa8/qzae007f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dd7/11423855/1962d244e007/qzae007f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dd7/11423855/f4a45189dcae/qzae007f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dd7/11423855/b2db948d0aa8/qzae007f3.jpg

相似文献

1
FP-Zernike: An Open-source Structural Database Construction Toolkit for Fast Structure Retrieval.FP-Zernike:用于快速结构检索的开源结构数据库构建工具包。
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzae007.
2
Fast protein tertiary structure retrieval based on global surface shape similarity.基于全局表面形状相似性的快速蛋白质三级结构检索
Proteins. 2008 Sep;72(4):1259-73. doi: 10.1002/prot.22030.
3
Real time structural search of the Protein Data Bank.实时蛋白质数据库结构搜索。
PLoS Comput Biol. 2020 Jul 8;16(7):e1007970. doi: 10.1371/journal.pcbi.1007970. eCollection 2020 Jul.
4
Protein 3D Structure and Electron Microscopy Map Retrieval Using 3D-SURFER2.0 and EM-SURFER.使用3D-SURFER2.0和EM-SURFER进行蛋白质三维结构和电子显微镜图谱检索
Curr Protoc Bioinformatics. 2017 Dec 8;60:3.14.1-3.14.15. doi: 10.1002/cpbi.37.
5
RCSB protein Data Bank: exploring protein 3D similarities via comprehensive structural alignments.RCSB 蛋白质数据库:通过全面的结构比对探索蛋白质 3D 相似性。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae370.
6
Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking.利用 3D Zernike 描述符进行分子表面表示,以进行蛋白质形状比较和对接。
Curr Protein Pept Sci. 2011 Sep;12(6):520-30. doi: 10.2174/138920311796957612.
7
Structural Outlier Detection and Zernike-Canterakis Moments for Molecular Surface Meshes-Fast Implementation in Python.用于分子表面网格的结构异常检测和泽尼克-坎特拉基斯矩——Python 中的快速实现
Molecules. 2023 Dec 21;29(1):52. doi: 10.3390/molecules29010052.
8
DescFold: a web server for protein fold recognition.DescFold:用于蛋白质折叠识别的网络服务器。
BMC Bioinformatics. 2009 Dec 14;10:416. doi: 10.1186/1471-2105-10-416.
9
Three dimensional shape comparison of flexible proteins using the local-diameter descriptor.使用局部直径描述符对柔性蛋白质进行三维形状比较。
BMC Struct Biol. 2009 May 12;9:29. doi: 10.1186/1472-6807-9-29.
10
Structural alignment of protein descriptors - a combinatorial model.蛋白质描述符的结构比对——一种组合模型
BMC Bioinformatics. 2016 Sep 17;17:383. doi: 10.1186/s12859-016-1237-9.

本文引用的文献

1
Surface ID: a geometry-aware system for protein molecular surface comparison.表面标识:一个用于蛋白质分子表面比较的几何感知系统。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad196.
2
US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes.US-align:蛋白质、核酸和大分子复合物的通用结构比对。
Nat Methods. 2022 Sep;19(9):1109-1115. doi: 10.1038/s41592-022-01585-1. Epub 2022 Aug 29.
3
Real-time structure search and structure classification for AlphaFold protein models.
实时的 AlphaFold 蛋白质模型结构搜索和结构分类。
Commun Biol. 2022 Apr 5;5(1):316. doi: 10.1038/s42003-022-03261-8.
4
Binding site identification of G protein-coupled receptors through a 3D Zernike polynomials-based method: application to C. elegans olfactory receptors.通过基于 3D Zernike 多项式的方法鉴定 G 蛋白偶联受体的结合位点:在秀丽隐杆线虫嗅觉受体中的应用。
J Comput Aided Mol Des. 2022 Jan;36(1):11-24. doi: 10.1007/s10822-021-00434-1. Epub 2022 Jan 1.
5
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
6
ZEAL: protein structure alignment based on shape similarity.ZEAL:基于形状相似性的蛋白质结构比对。
Bioinformatics. 2021 Sep 29;37(18):2874-2881. doi: 10.1093/bioinformatics/btab205.
7
2D Zernike polynomial expansion: Finding the protein-protein binding regions.二维泽尼克多项式展开:寻找蛋白质-蛋白质结合区域。
Comput Struct Biotechnol J. 2020 Dec 4;19:29-36. doi: 10.1016/j.csbj.2020.11.051. eCollection 2021.
8
RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences.RCSB 蛋白质数据库:用于基础生物学、生物医学、生物技术、生物工程和能源科学等领域的基础研究、应用研究和教育中探索生物大分子三维结构的强大新工具。
Nucleic Acids Res. 2021 Jan 8;49(D1):D437-D451. doi: 10.1093/nar/gkaa1038.
9
Real time structural search of the Protein Data Bank.实时蛋白质数据库结构搜索。
PLoS Comput Biol. 2020 Jul 8;16(7):e1007970. doi: 10.1371/journal.pcbi.1007970. eCollection 2020 Jul.
10
Quantitative Characterization of Binding Pockets and Binding Complementarity by Means of Zernike Descriptors.通过 Zernike 描述符定量描述结合口袋和结合互补性。
J Chem Inf Model. 2020 Mar 23;60(3):1390-1398. doi: 10.1021/acs.jcim.9b01066. Epub 2020 Feb 25.