• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

内存数据库中高通量测序数据的比对

Alignment of high-throughput sequencing data inside in-memory databases.

作者信息

Firnkorn Daniel, Knaup-Gregori Petra, Lorenzo Bermejo Justo, Ganzinger Matthias

机构信息

Institute of Medical Biometry and Informatics, Heidelberg, Germany.

出版信息

Stud Health Technol Inform. 2014;205:476-80.

PMID:25160230
Abstract

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

摘要

在高通量DNA测序技术时代,对DNA序列进行高性能分析至关重要。计算机支持的DNA分析仍然是一项耗时的密集型任务。在本文中,我们通过使用SAP的高性能分析设备(HANA)来探索一种新的内存数据库技术的潜力。我们将读段比对作为DNA序列分析的第一步加以重点研究。特别是,我们研究了广泛使用的Burrows-Wheeler比对器(BWA),并在HANA和免费数据库系统MySQL中都实现了存储过程,以比较执行时间和内存管理。为确保结果具有可比性,MySQL也已在内存中运行,利用其集成内存引擎来创建数据库表。我们实现了存储过程,其中包含在参考基因组GRCh37中对DNA读段进行精确和不精确搜索。由于SAP HANA在递归方面存在技术限制,因此无法在该平台上实现不精确匹配问题。因此,通过比较精确搜索过程的执行时间对HANA和MySQL进行了性能分析。在此,HANA比MySQL快约27倍,这意味着新的内存概念具有很大潜力,有望在未来推动DNA分析程序的进一步发展。

相似文献

1
Alignment of high-throughput sequencing data inside in-memory databases.内存数据库中高通量测序数据的比对
Stud Health Technol Inform. 2014;205:476-80.
2
Faster single-end alignment generation utilizing multi-thread for BWA.利用多线程实现更快的BWA单端比对生成。
Biomed Mater Eng. 2015;26 Suppl 1:S1791-6. doi: 10.3233/BME-151480.
3
Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.嵌套包含列表(NCList):一种加速基因组比对和区间数据库区间查询的新算法。
Bioinformatics. 2007 Jun 1;23(11):1386-93. doi: 10.1093/bioinformatics/btl647. Epub 2007 Jan 18.
4
Kmer-indexer: A Fast K-mer Indexing Program.Kmer索引器:一个快速的k-mer索引程序。
Stud Health Technol Inform. 2015;216:1083.
5
Evaluation of DNA mixtures from database search.
Biometrics. 2010 Mar;66(1):233-8. doi: 10.1111/j.1541-0420.2009.01271.x. Epub 2009 May 18.
6
Multi-threading the generation of Burrows-Wheeler Alignment.多线程生成布罗-惠勒比对。
Genet Mol Res. 2016 May 23;15(2):gmr8650. doi: 10.4238/gmr.15028650.
7
GATA: a graphic alignment tool for comparative sequence analysis.GATA:一种用于比较序列分析的图形比对工具。
BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.
8
A new approach for gene annotation using unambiguous sequence joining.一种使用明确序列连接进行基因注释的新方法。
Proc IEEE Comput Soc Bioinform Conf. 2003;2:353-62.
9
GrailEXP and Genome Analysis Pipeline for genome annotation.用于基因组注释的GrailEXP和基因组分析管道。
Curr Protoc Bioinformatics. 2004 Feb;Chapter 4:Unit4.9. doi: 10.1002/0471250953.bi0409s04.
10
SNPCEQer II: the integrated detection and analysis of SNPs in DNA sequences.SNPCEQer II:DNA序列中SNP的综合检测与分析
Appl Bioinformatics. 2003;2(3):151-4.

引用本文的文献

1
Immune checkpoint inhibitor-related molecular markers predict prognosis in extrahepatic cholangiocarcinoma.免疫检查点抑制剂相关分子标志物预测肝外胆管癌的预后。
Cancer Med. 2023 Oct;12(20):20470-20481. doi: 10.1002/cam4.6441. Epub 2023 Oct 10.
2
Secondary use of routine data in hospitals: description of a scalable analytical platform based on a business intelligence system.医院常规数据的二次利用:基于商业智能系统的可扩展分析平台描述
JAMIA Open. 2018 Sep 20;1(2):172-177. doi: 10.1093/jamiaopen/ooy039. eCollection 2018 Oct.