GPU上同源性检测算法的预过滤模型

Prefiltering Model for Homology Detection Algorithms on GPU.

作者信息

Retamosa Germán, de Pedro Luis, González Ivan, Tamames Javier

机构信息

High Performance Computing and Networking Department, Universidad Autonóma de Madrid, Madrid, Spain.

National Center for Biotechnology, CSIC, Madrid, Spain.

出版信息

Evol Bioinform Online. 2016 Dec 18;12:313-322. doi: 10.4137/EBO.S40877. eCollection 2016.

DOI:10.4137/EBO.S40877

PMID:28008220

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5170890/

Abstract

Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4.

摘要

同源性检测随着时间的推移，已经从基于动态规划方法的复杂算法发展到基于不同启发式模型的轻量级替代方法。然而，这些算法的主要问题在于它们使用复杂的统计模型，这使得难以实现显著的加速，也难以找到与原始结果完全匹配的结果。因此，对它们进行加速至关重要。本文的目的是对序列数据库进行预过滤。为实现这一目标，我们基于英伟达的图形处理单元（GPU）和多核处理器实现了一种开创性的启发式模型。根据灵敏度设置，这使得能够快速将序列数据库减少50%至95%，同时不会遗漏任何重要序列。此外，这种预过滤应用程序可与多种同源性检测算法一起作为下一代测序系统的一部分使用。西班牙国家生物技术中心（NCB）已进行了广泛的性能和准确性测试。结果表明，GPU硬件可将诸如美国国立生物技术信息中心（NCBI）、蛋白质基本局部比对搜索工具（BLASTP）等先前同源性检测应用程序的执行时间加速4倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cc/5170890/f5dfe8cc0dfe/ebo-12-2016-313f1.jpg

相似文献

Prefiltering Model for Homology Detection Algorithms on GPU.

Evol Bioinform Online. 2016 Dec 18;12:313-322. doi: 10.4137/EBO.S40877. eCollection 2016.

CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment.

BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-9-S2-S10.

CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

BMC Bioinformatics. 2016 Feb 27;17:106. doi: 10.1186/s12859-016-0946-4.

NMF-mGPU: non-negative matrix factorization on multi-GPU systems.

BMC Bioinformatics. 2015 Feb 13;16:43. doi: 10.1186/s12859-015-0485-4.

GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda.

BMC Med Genomics. 2014;7 Suppl 1(Suppl 1):S9. doi: 10.1186/1755-8794-7-S1-S9. Epub 2014 May 8.

CUDA-BLASTP: accelerating BLASTP on CUDA-enabled graphics hardware.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1678-84. doi: 10.1109/TCBB.2011.33.

G-BLASTN: accelerating nucleotide alignment by graphics processors.

Bioinformatics. 2014 May 15;30(10):1384-91. doi: 10.1093/bioinformatics/btu047. Epub 2014 Jan 24.

GHOSTM: a GPU-accelerated homology search tool for metagenomics.

PLoS One. 2012;7(5):e36060. doi: 10.1371/journal.pone.0036060. Epub 2012 May 4.

muBLASTP: database-indexed protein sequence search on multicore CPUs.

BMC Bioinformatics. 2016 Nov 4;17(1):443. doi: 10.1186/s12859-016-1302-4.

H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs.

Bioinformatics. 2017 Apr 15;33(8):1130-1138. doi: 10.1093/bioinformatics/btw769.

本文引用的文献

dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation.

Sci Rep. 2016 Sep 1;6:32333. doi: 10.1038/srep32333.

Application of learning to rank to protein remote homology detection.

Bioinformatics. 2015 Nov 1;31(21):3492-8. doi: 10.1093/bioinformatics/btv413. Epub 2015 Jul 10.

Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.

Mol Genet Genomics. 2015 Oct;290(5):1919-31. doi: 10.1007/s00438-015-1044-4. Epub 2015 Apr 21.

Using distances between Top-n-gram and residue pairs for protein remote homology detection.

BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.

CUDA-BLASTP: accelerating BLASTP on CUDA-enabled graphics hardware.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1678-84. doi: 10.1109/TCBB.2011.33.

GPU-BLAST: using graphics processors to accelerate protein sequence alignment.

Bioinformatics. 2011 Jan 15;27(2):182-8. doi: 10.1093/bioinformatics/btq644. Epub 2010 Nov 18.

CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

BMC Res Notes. 2010 Apr 6;3:93. doi: 10.1186/1756-0500-3-93.

Mercury BLASTP: Accelerating Protein Sequence Alignment.

ACM Trans Reconfigurable Technol Syst. 2008 Jun;1(2):9. doi: 10.1145/1371579.1371581.

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.

BMC Res Notes. 2009 May 6;2:73. doi: 10.1186/1756-0500-2-73.

CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment.

BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-9-S2-S10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GPU上同源性检测算法的预过滤模型

Prefiltering Model for Homology Detection Algorithms on GPU.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献