• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

rHAT:基于区域哈希的快速对齐含噪长读。

rHAT: fast alignment of noisy long reads with regional hashing.

机构信息

Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.

出版信息

Bioinformatics. 2016 Jun 1;32(11):1625-31. doi: 10.1093/bioinformatics/btv662. Epub 2015 Nov 14.

DOI:10.1093/bioinformatics/btv662
PMID:26568628
Abstract

MOTIVATION

Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bottleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment.

RESULTS

We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput.

AVAILABILITY AND IMPLEMENTATION

rHAT is implemented in C++; the source code is available at https://github.com/HIT-Bioinformatics/rHAT CONTACT: ydwang@hit.edu.cn

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单分子实时 (SMRT) 测序已广泛应用于前沿基因组研究。然而,通过最先进的对齐器将嘈杂的长 SMRT 读取与参考基因组对齐仍然是一项昂贵的任务,这在 SMRT 测序的应用中成为了一个瓶颈。需要新的方法来提高 SMRT 读取对齐的效率和效果。

结果

我们提出了基于区域哈希的对齐工具 (rHAT),这是一种专门为嘈杂的长读取设计的基于种子和扩展的读取对齐方法。rHAT 通过区域哈希表 (RHT) 对参考基因组进行索引,RHT 是一种基于哈希表的索引,描述了参考基因组局部窗口内的短标记。在种子阶段,rHAT 利用 RHT 高效地计算部分读取和局部基因组窗口之间短标记匹配的出现次数,以找到高度可能的候选位点。在扩展阶段,使用稀疏动态规划启发式方法来降低将读取与候选位点对齐的成本。通过在来自各种原核生物和真核生物基因组的真实和模拟数据集上进行基准测试,我们证明 rHAT 可以有效地对齐 SMRT 读取,具有出色的吞吐量。

可用性和实现

rHAT 是用 C++实现的;源代码可在 https://github.com/HIT-Bioinformatics/rHAT 上获得。

联系人

ydwang@hit.edu.cn

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
rHAT: fast alignment of noisy long reads with regional hashing.rHAT:基于区域哈希的快速对齐含噪长读。
Bioinformatics. 2016 Jun 1;32(11):1625-31. doi: 10.1093/bioinformatics/btv662. Epub 2015 Nov 14.
2
LAMSA: fast split read alignment with long approximate matches.LAMSA:快速分裂读取比对算法,具有长近似匹配功能。
Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25.
3
deBGA: read alignment with de Bruijn graph-based seed and extension.deBGA:基于 de Bruijn 图的种子和扩展进行读对齐。
Bioinformatics. 2016 Nov 1;32(21):3224-3232. doi: 10.1093/bioinformatics/btw371. Epub 2016 Jul 4.
4
conLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads.conLSH:用于嘈杂单分子实时测序(SMRT)读段映射的基于上下文的局部敏感哈希算法
Comput Biol Chem. 2020 Apr;85:107206. doi: 10.1016/j.compbiolchem.2020.107206. Epub 2020 Jan 18.
5
rMFilter: acceleration of long read-based structure variation calling by chimeric read filtering.rMFilter:通过嵌合读段过滤加速基于长读段的结构变异检测
Bioinformatics. 2017 Sep 1;33(17):2750-2752. doi: 10.1093/bioinformatics/btx279.
6
lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.lordFAST:用于长噪声测序数据的敏感快速比对搜索工具。
Bioinformatics. 2019 Jan 1;35(1):20-27. doi: 10.1093/bioinformatics/bty544.
7
Arioc: GPU-accelerated alignment of short bisulfite-treated reads.Arioc:用于短亚硫酸氢盐处理读取物的 GPU 加速对齐。
Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167.
8
Vargas: heuristic-free alignment for assessing linear and graph read aligners.瓦尔加斯:用于评估线性和图形读取对齐程序的无启发式对齐。
Bioinformatics. 2020 Jun 1;36(12):3712-3718. doi: 10.1093/bioinformatics/btaa265.
9
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.GateKeeper:一种用于加速 DNA 短读映射预对齐的新硬件架构。
Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.
10
Fast and SNP-aware short read alignment with SALT.基于 SALT 的快速 SNP 感知短读序列比对。
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.

引用本文的文献

1
A survey of sequence-to-graph mapping algorithms in the pangenome era.泛基因组时代序列到图谱映射算法综述。
Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6.
2
Fast noisy long read alignment with multi-level parallelism.基于多级并行的快速噪声长读比对
BMC Bioinformatics. 2025 May 2;26(1):118. doi: 10.1186/s12859-025-06129-w.
3
Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions.序列比对对蛋白质亚细胞定位预测准确性的影响。
Proteins. 2025 Mar;93(3):745-759. doi: 10.1002/prot.26767. Epub 2024 Nov 22.
4
pathMap: a path-based mapping tool for long noisy reads with high sensitivity.路径图:一种基于路径的长噪声读取高灵敏度映射工具。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.
5
Benchmarking long-read genome sequence alignment tools for human genomics applications.用于人类基因组学应用的长读长基因组序列比对工具的基准测试。
PeerJ. 2023 Dec 18;11:e16515. doi: 10.7717/peerj.16515. eCollection 2023.
6
invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.invMap:一种用于具有反转结构变体的长噪声读取的敏感映射工具。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.
7
A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。
Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.
8
BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis.BLEND:一种在基因组分析中快速、节省内存且准确地查找模糊种子匹配项的机制。
NAR Genom Bioinform. 2023 Jan 20;5(1):lqad004. doi: 10.1093/nargab/lqad004. eCollection 2023 Mar.
9
From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.从分子到基因组变异:通过智能算法和架构加速基因组分析
Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. doi: 10.1016/j.csbj.2022.08.019. eCollection 2022.
10
kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the -Mer Neighborhood Graph.kngMap:基于-mer邻域图的针对噪声长读段的灵敏且快速的映射算法
Front Genet. 2022 May 5;13:890651. doi: 10.3389/fgene.2022.890651. eCollection 2022.