Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China.
BioMap Research, Menlo Park, CA 94025, USA.
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzae007.
The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-the-art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-the-art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.
AlphaFold2 的发布引发了蛋白质模型数据库的快速扩张。高效的蛋白质结构检索对于结构模型的分析至关重要,而衡量结构之间的相似性是结构检索的关键挑战。尽管现有的结构对齐算法可以解决这个挑战,但它们通常很耗时。目前,最先进的方法是将蛋白质结构转换为三维(3D)Zernike 描述符,并使用欧几里得距离评估相似性。然而,计算 3D Zernike 描述符的方法主要依赖于结构表面,并且主要是基于网络的,因此限制了它们在研究自定义数据集方面的应用。为了克服这个限制,我们开发了 FP-Zernike,这是一个基于特征点的计算不同类型 Zernike 描述符的用户友好工具包。用户只需输入一行命令即可计算自定义数据集中所有结构的 Zernike 描述符。FP-Zernike 在各种基准数据集的检索准确性和二进制分类准确性方面均优于领先方法。此外,我们展示了 FP-Zernike 在描述符数据库构建和 Protein Data Bank(PDB)数据集协议中的应用,以方便对此工具感兴趣的读者在本地部署。我们的演示包含 590685 个结构,在这个规模下,我们的系统仅需 4-9 秒即可完成检索。实验证实它达到了最先进的准确性水平。FP-Zernike 是一个开源工具包,其源代码和相关数据可在 https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1 以及 http://www.structbioinfo.cn/ 上获取。