• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

低语:读排序允许对 DNA 测序数据进行稳健的映射。

Whisper: read sorting allows robust mapping of DNA sequencing data.

机构信息

Institute of Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, PL, Poland.

Institute of Applied Computer Science, Faculty of Electrical, Electronic, Computer and Control Engineering, Lodz University of Technology, Stefanowskiego 18/22, Łódź, PL, Poland.

出版信息

Bioinformatics. 2019 Jun 1;35(12):2043-2050. doi: 10.1093/bioinformatics/bty927.

DOI:10.1093/bioinformatics/bty927
PMID:30407485
Abstract

MOTIVATION

Mapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. The reduction of sequencing costs implies a need for algorithms able to process increasing amounts of generated data in reasonable time.

RESULTS

We present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known BWA-MEM and Bowtie2 tools at a comparable accuracy, validated in a variant calling pipeline.

AVAILABILITY AND IMPLEMENTATION

Whisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

将读取内容映射到参考基因组通常是测序数据分析管道的第一步。测序成本的降低意味着需要能够在合理的时间内处理越来越多生成数据的算法。

结果

我们提出了 Whisper,这是一种基于排序读取内容并将其与参考基因组及其反转互补的后缀数组进行映射的准确且高性能的映射工具。采用任务和数据并行以及在磁盘上存储临时数据的方法,在合理的内存要求下实现了卓越的时间效率。Whisper 在大型 NGS 读取集合中表现出色,特别是具有典型 WGS 覆盖度的 Illumina 读取内容。使用真实数据的实验表明,我们的解决方案在可比精度下,可比 BWA-MEM 和 Bowtie2 等知名工具快约 15%,并且在变异调用管道中得到了验证。

可用性和实现

Whisper 可从 https://github.com/refresh-bio/Whisper 或 http://sun.aei.polsl.pl/REFRESH/Whisper/ 免费获得。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
Whisper: read sorting allows robust mapping of DNA sequencing data.低语:读排序允许对 DNA 测序数据进行稳健的映射。
Bioinformatics. 2019 Jun 1;35(12):2043-2050. doi: 10.1093/bioinformatics/bty927.
2
Kmer-db: instant evolutionary distance estimation.Kmer-db:即时进化距离估计。
Bioinformatics. 2019 Jan 1;35(1):133-136. doi: 10.1093/bioinformatics/bty610.
3
Large scale microbiome profiling in the cloud.大规模微生物组在云端的分析。
Bioinformatics. 2019 Jul 15;35(14):i13-i22. doi: 10.1093/bioinformatics/btz356.
4
Evaluation of variant calling tools for large plant genome re-sequencing.评价用于大型植物基因组重测序的变异调用工具。
BMC Bioinformatics. 2020 Aug 17;21(1):360. doi: 10.1186/s12859-020-03704-1.
5
RECKONER: read error corrector based on KMC.RECKONER:基于 KMC 的读错误校正器。
Bioinformatics. 2017 Apr 1;33(7):1086-1089. doi: 10.1093/bioinformatics/btw746.
6
GTC: how to maintain huge genotype collections in a compressed form.GTC:如何以压缩形式保存大型基因型集合。
Bioinformatics. 2018 Jun 1;34(11):1834-1840. doi: 10.1093/bioinformatics/bty023.
7
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.基于哈希的重排序压缩基因组测序reads:算法与分析。
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.
8
Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing.针对靶向测序的新型算法,用于纳米孔原始信号中的高效子序列搜索和映射。
Bioinformatics. 2020 Mar 1;36(5):1333-1343. doi: 10.1093/bioinformatics/btz742.
9
Nubeam-dedup: a fast and RAM-efficient tool to de-duplicate sequencing reads without mapping.Nubeam-dedup:一款快速且节省内存的去重工具,无需进行测序读取映射。
Bioinformatics. 2020 May 1;36(10):3254-3256. doi: 10.1093/bioinformatics/btaa112.
10
Assessing the impact of exact reads on reducing the error rate of read mapping.评估精确读取对降低读取映射错误率的影响。
BMC Bioinformatics. 2018 Nov 6;19(1):406. doi: 10.1186/s12859-018-2432-7.

引用本文的文献

1
Taming large-scale genomic analyses via sparsified genomics.通过稀疏化基因组学实现大规模基因组分析的优化
Nat Commun. 2025 Jan 21;16(1):876. doi: 10.1038/s41467-024-55762-1.
2
Meta-transcriptomics for the diversity of tick-borne virus in Nujiang, Yunnan Province.云南省怒江州蜱传病毒多样性的元转录组学研究。
Front Cell Infect Microbiol. 2023 Dec 15;13:1283019. doi: 10.3389/fcimb.2023.1283019. eCollection 2023.
3
A time-series meta-transcriptomic analysis reveals the seasonal, host, and gender structure of mosquito viromes.
一项时间序列宏转录组分析揭示了蚊子病毒组的季节性、宿主和性别结构。
Virus Evol. 2022 Feb 2;8(1):veac006. doi: 10.1093/ve/veac006. eCollection 2022.