Marques Rômulo S, Souza Michael, Batista Fernando, Gonçalves Miguel, Lavor Carlile
Instituto de Matemática, Estatística e Computação Científica, Universidade Estadual de Campinas, Campinas 13083-859, Brazil.
Departamento de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza 60020-181, Brazil.
J Chem Inf Model. 2025 Jan 13;65(1):427-434. doi: 10.1021/acs.jcim.4c00427. Epub 2024 Nov 13.
The discovery of the three-dimensional shape of protein molecules using interatomic distance information from nuclear magnetic resonance (NMR) can be modeled as a discretizable molecular distance geometry problem (DMDGP). Due to its combinatorial characteristics, the problem is conventionally solved in the literature as a depth-first search in a binary tree. In this work, we introduce a new search strategy, which we call frequency-based search (FBS), that for the first time utilizes geometric information contained in the protein data bank (PDB). We encode the geometric configurations of 14,382 molecules derived from NMR experiments present in the PDB into binary strings. The obtained results show that the sample space of the binary strings extracted from the PDB does not follow a uniform distribution. Furthermore, we compare the runtime of the symmetry-based build-Up (SBBU) algorithm (the most efficient method in the literature to solve the DMDGP) combined with FBS and the depth-first search (DFS) in finding a solution, ascertaining that FBS performs better in about 70% of the cases.
利用来自核磁共振(NMR)的原子间距离信息来发现蛋白质分子的三维形状,这一过程可被建模为一个可离散化的分子距离几何问题(DMDGP)。由于其组合特性,该问题在文献中传统上是通过在二叉树中进行深度优先搜索来解决的。在这项工作中,我们引入了一种新的搜索策略,我们称之为基于频率的搜索(FBS),它首次利用了蛋白质数据库(PDB)中包含的几何信息。我们将PDB中存在的来自NMR实验的14382个分子的几何构型编码为二进制字符串。所得结果表明,从PDB中提取的二进制字符串的样本空间并不遵循均匀分布。此外,我们比较了基于对称性的构建(SBBU)算法(文献中解决DMDGP最有效的方法)与FBS相结合以及深度优先搜索(DFS)在寻找解决方案时的运行时间,确定在大约70%的情况下FBS表现更好。