内存数据库中高通量测序数据的比对

Alignment of high-throughput sequencing data inside in-memory databases.

作者信息

Firnkorn Daniel, Knaup-Gregori Petra, Lorenzo Bermejo Justo, Ganzinger Matthias

机构信息

Institute of Medical Biometry and Informatics, Heidelberg, Germany.

出版信息

Stud Health Technol Inform. 2014;205:476-80.

PMID:25160230

Abstract

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

摘要

在高通量DNA测序技术时代，对DNA序列进行高性能分析至关重要。计算机支持的DNA分析仍然是一项耗时的密集型任务。在本文中，我们通过使用SAP的高性能分析设备（HANA）来探索一种新的内存数据库技术的潜力。我们将读段比对作为DNA序列分析的第一步加以重点研究。特别是，我们研究了广泛使用的Burrows-Wheeler比对器（BWA），并在HANA和免费数据库系统MySQL中都实现了存储过程，以比较执行时间和内存管理。为确保结果具有可比性，MySQL也已在内存中运行，利用其集成内存引擎来创建数据库表。我们实现了存储过程，其中包含在参考基因组GRCh37中对DNA读段进行精确和不精确搜索。由于SAP HANA在递归方面存在技术限制，因此无法在该平台上实现不精确匹配问题。因此，通过比较精确搜索过程的执行时间对HANA和MySQL进行了性能分析。在此，HANA比MySQL快约27倍，这意味着新的内存概念具有很大潜力，有望在未来推动DNA分析程序的进一步发展。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

内存数据库中高通量测序数据的比对

Alignment of high-throughput sequencing data inside in-memory databases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

内存数据库中高通量测序数据的比对

Alignment of high-throughput sequencing data inside in-memory databases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献