IEEE/ACM Trans Comput Biol Bioinform. 2024 May-Jun;21(3):508-515. doi: 10.1109/TCBB.2024.3367789. Epub 2024 Jun 5.
Identifying conserved (similar) three-dimensional patterns among a set of proteins can be helpful for the rational design of polypharmacological drugs. Some available tools allow this identification from a limited perspective, only considering the available information, such as known binding sites or previously annotated structural motifs. Thus, these approaches do not look for similarities among all putative orthosteric and or allosteric bindings sites between protein structures. To overcome this tech-weakness Geomfinder was developed, an algorithm for the estimation of similarities between all pairs of three-dimensional amino acids patterns detected in any two given protein structures, which works without information about their known patterns. Even though Geomfinder is a functional alternative to compare small structural proteins, it is computationally unfeasible for the case of large protein processing and the algorithm needs to improve its performance. This work presents several parallel versions of the Geomfinder to exploit SMPs, distributed memory systems, hybrid version of SMP and distributed memory systems, and GPU based systems. Results show significant improvements in performance as compared to the original version and achieve up to 24.5x speedup when analyzing proteins of average size and up to 95.4x in larger proteins.
确定一组蛋白质中保守(相似)的三维结构模式对于合理设计多配体药物可能很有帮助。一些可用的工具允许从有限的角度进行这种识别,仅考虑可用的信息,例如已知的结合位点或先前注释的结构基序。因此,这些方法不会在蛋白质结构之间的所有假定的正构和/或所有构象结合位点之间寻找相似性。为了克服这一技术缺陷,开发了 Geomfinder,这是一种用于估计任何两个给定蛋白质结构中检测到的所有三维氨基酸模式对之间相似性的算法,它无需有关其已知模式的信息。尽管 Geomfinder 是一种用于比较小结构蛋白质的功能替代方法,但对于大型蛋白质处理的情况,它在计算上是不可行的,并且算法需要提高其性能。这项工作提出了 Geomfinder 的几个并行版本,以利用 SMP、分布式内存系统、SMP 和分布式内存系统的混合版本以及基于 GPU 的系统。与原始版本相比,结果显示性能有了显著提高,在分析平均大小的蛋白质时,最高可达到 24.5 倍的加速,在分析较大的蛋白质时,最高可达到 95.4 倍。