探索基于编辑距离的 motif 搜索的可扩展并行化。

Exploring Scalable Parallelization for Edit Distance-Based Motif Search.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1587-1593. doi: 10.1109/TCBB.2022.3208867.

DOI:10.1109/TCBB.2022.3208867

Abstract

Motif Searching is an important problem that can reveal crucial information from biological data. Since the general motif searching is NP-hard and the volume of biological data is growing exponentially in recent years, there is a pressing need for developing time and space-efficient algorithms to find motifs. In this paper, we explore scalable parallelization for Edit Distance-Based Motif Search (EMS). We introduce two parallel designs, recursEMS which integrates the existing EMS solver into a parallel recursion tree running in multiple processes, and parEMS that presents a novel thread-based method which avoids the storage of redundant motif candidates. To make the parallel designs practical, we implement SPEMS, a Scalability-sensitive Parallel solver for EMS. For any given biological dataset and search instance, SPEMS can provide an EMS parallelization towards the optimal performance, or a sub-optimal performance but being more space efficient. Evaluations on two real-world DNA dataset TRANSFAC and ChIP-seq show that SPEMS can obtain 10× geometric mean speedup over the state-of-the-art at the expense of no less than 74.7% memory overheads, or provide 2.2× geometric mean speedup with the possibility of consuming less memory, when running on a 48-core machine.

摘要

模体搜索是一个重要的问题，可以从生物数据中揭示关键信息。由于一般的模体搜索是 NP 难问题，并且近年来生物数据的数量呈指数级增长，因此迫切需要开发时间和空间效率高的算法来寻找模体。在本文中，我们探索了基于编辑距离的模体搜索（EMS）的可扩展并行化。我们引入了两种并行设计，recursEMS 将现有的 EMS 求解器集成到多个进程中运行的并行递归树中，parEMS 提出了一种新的基于线程的方法，避免了冗余模体候选的存储。为了使并行设计实用化，我们实现了 SPEMS，这是一种针对 EMS 的可扩展并行求解器。对于任何给定的生物数据集和搜索实例，SPEMS 都可以提供 EMS 并行化，以实现最佳性能，或者提供次优性能但更节省空间。在两个真实的 DNA 数据集 TRANSFAC 和 ChIP-seq 上的评估表明，SPEMS 可以在不超过 74.7%的内存开销的情况下，获得比最先进的方法快 10 倍的几何平均加速，或者在可能消耗更少内存的情况下提供 2.2 倍的几何平均加速，当在 48 核机器上运行时。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

探索基于编辑距离的 motif 搜索的可扩展并行化。

Exploring Scalable Parallelization for Edit Distance-Based Motif Search.

出版信息

相似文献

引用本文的文献

探索基于编辑距离的 motif 搜索的可扩展并行化。

Exploring Scalable Parallelization for Edit Distance-Based Motif Search.

出版信息

相似文献

引用本文的文献