• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于OpenMP的生物信息学中查找最长公共子序列的工具。

An OpenMP-based tool for finding longest common subsequence in bioinformatics.

作者信息

Shikder Rayhan, Thulasiraman Parimala, Irani Pourang, Hu Pingzhao

机构信息

Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada.

Department of Biochemistry and Medical Genetics and The George and Fay Yee Centre for Healthcare Innovation, University of Manitoba, Room 308-Basic Medical Sciences Building, 745 Bannatyne Avenue, Winnipeg, MB, R3E 0J9, Canada.

出版信息

BMC Res Notes. 2019 Apr 11;12(1):220. doi: 10.1186/s13104-019-4256-6.

DOI:10.1186/s13104-019-4256-6
PMID:30971295
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6458724/
Abstract

OBJECTIVE

Finding the longest common subsequence (LCS) among sequences is NP-hard. This is an important problem in bioinformatics for DNA sequence alignment and pattern discovery. In this research, we propose new CPU-based parallel implementations that can provide significant advantages in terms of execution times, monetary cost, and pervasiveness in finding LCS of DNA sequences in an environment where Graphics Processing Units are not available. For general purpose use, we also make the OpenMP-based tool publicly available to end users.

RESULT

In this study, we develop three novel parallel versions of the LCS algorithm on: (i) distributed memory machine using message passing interface (MPI); (ii) shared memory machine using OpenMP, and (iii) hybrid platform that utilizes both distributed and shared memory using MPI-OpenMP. The experimental results with both simulated and real DNA sequence data show that the shared memory OpenMP implementation provides at least two-times absolute speedup than the best sequential version of the algorithm and a relative speedup of almost 7. We provide a detailed comparison of the execution times among the implementations on different platforms with different versions of the algorithm. We also show that removing branch conditions negatively affects the performance of the CPU-based parallel algorithm on OpenMP platform.

摘要

目的

在序列中寻找最长公共子序列(LCS)是NP难问题。这是生物信息学中DNA序列比对和模式发现的一个重要问题。在本研究中,我们提出了基于CPU的新并行实现方法,在没有图形处理单元的环境下,这些方法在执行时间、成本和普及性方面具有显著优势,可用于寻找DNA序列的LCS。为了便于通用,我们还向终端用户公开了基于OpenMP的工具。

结果

在本研究中,我们在以下平台上开发了LCS算法的三个新颖并行版本:(i)使用消息传递接口(MPI)的分布式内存机器;(ii)使用OpenMP的共享内存机器;(iii)利用MPI-OpenMP同时使用分布式和共享内存的混合平台。对模拟和真实DNA序列数据的实验结果表明,共享内存OpenMP实现比该算法的最佳顺序版本提供了至少两倍的绝对加速比,相对加速比接近7。我们详细比较了不同平台上不同版本算法实现之间的执行时间。我们还表明,去除分支条件会对基于CPU的并行算法在OpenMP平台上的性能产生负面影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3433/6458724/84ffcbe273f5/13104_2019_4256_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3433/6458724/c6f4895c0a6b/13104_2019_4256_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3433/6458724/84ffcbe273f5/13104_2019_4256_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3433/6458724/c6f4895c0a6b/13104_2019_4256_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3433/6458724/84ffcbe273f5/13104_2019_4256_Fig2_HTML.jpg

相似文献

1
An OpenMP-based tool for finding longest common subsequence in bioinformatics.一种基于OpenMP的生物信息学中查找最长公共子序列的工具。
BMC Res Notes. 2019 Apr 11;12(1):220. doi: 10.1186/s13104-019-4256-6.
2
A new algorithm for "the LCS problem" with application in compressing genome resequencing data.一种应用于压缩基因组重测序数据的“最长公共子序列问题”新算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):544. doi: 10.1186/s12864-016-2793-0.
3
DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI.DecGPU:使用 CUDA 和 MPI 在大规模并行图形处理单元上进行分布式错误纠正。
BMC Bioinformatics. 2011 Mar 29;12:85. doi: 10.1186/1471-2105-12-85.
4
A fast parallel algorithm for finding the longest common sequence of multiple biosequences.一种用于寻找多个生物序列最长公共序列的快速并行算法。
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2105-7-S4-S4.
5
Deposition and extension approach to find longest common subsequence for thousands of long sequences.沉积和扩展方法来寻找数千个长序列的最长公共子序列。
Comput Biol Chem. 2010 Jun;34(3):149-57. doi: 10.1016/j.compbiolchem.2010.05.001. Epub 2010 May 11.
6
An improved distance matrix computation algorithm for multicore clusters.一种用于多核集群的改进型距离矩阵计算算法。
Biomed Res Int. 2014;2014:406178. doi: 10.1155/2014/406178. Epub 2014 Jun 12.
7
Exemplar longest common subsequence.示例最长公共子序列。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):535-43. doi: 10.1109/TCBB.2007.1066.
8
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.使用基于新分块方法的并行编程模型对SMP节点和工作站集群上的并行算法进行分析,以处理大型生物数据集。
Bioinform Biol Insights. 2016 Nov 30;10:255-265. doi: 10.4137/BBI.S40744. eCollection 2016.
9
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.面向高性能计算的生物信息学应用学习算法并行实现
BMC Bioinformatics. 2014;15 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-15-S5-S2. Epub 2014 May 6.
10
parMATT: parallel multiple alignment of protein 3D-structures with translations and twists for distributed-memory systems.parMATT:分布式内存系统中具有平移和扭转的蛋白质 3D 结构的并行多重比对。
Bioinformatics. 2019 Nov 1;35(21):4456-4458. doi: 10.1093/bioinformatics/btz224.

引用本文的文献

1
A novel framework for discovery and reuse of typical process route driven by symbolic entropy and intelligent optimisation algorithm.一种基于符号熵和智能优化算法的典型过程路线发现和重用的新框架。
PLoS One. 2022 Sep 12;17(9):e0274532. doi: 10.1371/journal.pone.0274532. eCollection 2022.
2
Analysis of Inducing Factors of Chronic Pulmonary Heart Disease Caused by Chronic Obstructive Pulmonary Disease at High Altitude through Epidemiological Investigation under Intelligent Medicine and Big Data.智能医学与大数据下高原地区慢性阻塞性肺疾病致慢性肺源性心脏病发病因素的流行病学调查分析。
J Healthc Eng. 2022 Jan 12;2022:2612074. doi: 10.1155/2022/2612074. eCollection 2022.
3

本文引用的文献

1
Big Data: Astronomical or Genomical?大数据:天文学的还是基因组学的?
PLoS Biol. 2015 Jul 7;13(7):e1002195. doi: 10.1371/journal.pbio.1002195. eCollection 2015 Jul.
2
Analysis of the relationships among Longest Common Subsequences, Shortest Common Supersequences and patterns and its application on pattern discovery in biological sequences.最长公共子序列、最短公共超序列与模式之间的关系分析及其在生物序列模式发现中的应用。
Int J Data Min Bioinform. 2011;5(6):611-25. doi: 10.1504/ijdmb.2011.045413.
3
Limitations and potentials of current motif discovery algorithms.
Computational Strategies for Scalable Genomics Analysis.
可扩展基因组分析的计算策略。
Genes (Basel). 2019 Dec 6;10(12):1017. doi: 10.3390/genes10121017.
当前基序发现算法的局限性与潜力。
Nucleic Acids Res. 2005 Sep 2;33(15):4899-913. doi: 10.1093/nar/gki791. Print 2005.
4
Improved tools for biological sequence comparison.用于生物序列比较的改进工具。
Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8. doi: 10.1073/pnas.85.8.2444.
5
Basic local alignment search tool.基本局部比对搜索工具
J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2.