• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多指标局部敏感哈希提高亚硫酸氢盐测序读段的比对准确性:BisHash。

Multi-metric locality sensitive hashing enhances alignment accuracy of bisulfite sequencing reads: BisHash.

作者信息

Nikaein Hassan, Sharifi-Zarchi Ali

机构信息

Department of Computer Engineering, Sharif University of Technology, Tehran, 1458889694, Iran.

出版信息

Bioinform Adv. 2025 Jul 23;5(1):vbaf144. doi: 10.1093/bioadv/vbaf144. eCollection 2025.

DOI:10.1093/bioadv/vbaf144
PMID:40831761
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12360834/
Abstract

MOTIVATION

Locality-Sensitive Hashing (LSH) is a widely used algorithm for estimating similarity between large datasets in bioinformatics, with applications in genome assembly, sequence alignment, and metagenomics. However, traditional single-metric LSH approaches often lead to inefficiencies, particularly when handling biological data where regions may have diverse evolutionary histories or structural properties. This limitation can reduce accuracy in sequence alignment, variant calling, and functional analysis.

RESULTS

We propose Multi-Metric Locality-Sensitive Hashing (M2LSH), an extension of LSH that integrates multiple similarity metrics for more accurate analysis of complex biological data. By capturing diverse sequence and structural features, M2LSH improves performance in heterogeneous and evolutionarily diverse regions. Building on this, we introduce Multi-Metric MinHash (M3Hash), enhancing sequence alignment and similarity detection. As a proof of concept, we present BisHash, which applies M2LSH to bisulfite sequencing, a key method in DNA methylation analysis. Although not fully optimized, BisHash demonstrates superior accuracy, particularly in challenging scenarios like cancer studies where traditional approaches often fail. Our results highlight the potential of M2LSH and M3Hash to advance bioinformatics research.

AVAILABILITY AND IMPLEMENTATION

The source code for BisHash and the test procedures for benchmarking aligners using simulated data are publicly accessible at https://github.com/hnikaein/bisHash.

摘要

动机

局部敏感哈希(Locality-Sensitive Hashing,LSH)是生物信息学中用于估计大型数据集之间相似度的一种广泛使用的算法,应用于基因组组装、序列比对和宏基因组学。然而,传统的单度量LSH方法常常导致效率低下,特别是在处理生物数据时,其中不同区域可能具有不同的进化历史或结构特性。这种局限性会降低序列比对、变异检测和功能分析的准确性。

结果

我们提出了多度量局部敏感哈希(Multi-Metric Locality-Sensitive Hashing,M2LSH),它是LSH的一种扩展,集成了多个相似度度量,以便更准确地分析复杂的生物数据。通过捕获不同的序列和结构特征,M2LSH提高了在异质和进化多样区域的性能。在此基础上,我们引入了多度量MinHash(Multi-Metric MinHash,M3Hash),增强了序列比对和相似度检测。作为概念验证,我们展示了BisHash,它将M2LSH应用于亚硫酸氢盐测序,这是DNA甲基化分析中的一种关键方法。尽管尚未完全优化,但BisHash展示了卓越的准确性,特别是在癌症研究等传统方法常常失效的具有挑战性的场景中。我们的结果突出了M2LSH和M3Hash在推进生物信息学研究方面的潜力。

可用性与实现

BisHash的源代码以及使用模拟数据对比对工具进行基准测试的测试程序可在https://github.com/hnikaein/bisHash上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33fc/12360834/76b74399f600/vbaf144f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33fc/12360834/d0d48445cd7f/vbaf144f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33fc/12360834/76b74399f600/vbaf144f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33fc/12360834/d0d48445cd7f/vbaf144f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33fc/12360834/76b74399f600/vbaf144f2.jpg

相似文献

1
Multi-metric locality sensitive hashing enhances alignment accuracy of bisulfite sequencing reads: BisHash.多指标局部敏感哈希提高亚硫酸氢盐测序读段的比对准确性:BisHash。
Bioinform Adv. 2025 Jul 23;5(1):vbaf144. doi: 10.1093/bioadv/vbaf144. eCollection 2025.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Aryana-bs: context-aware alignment of bisulfite-sequencing reads.Aryana-bs:亚硫酸氢盐测序读数的上下文感知比对
BMC Bioinformatics. 2025 Jul 21;26(1):188. doi: 10.1186/s12859-025-06182-5.
4
Anterior Approach Total Ankle Arthroplasty with Patient-Specific Cut Guides.使用患者特异性截骨导向器的前路全踝关节置换术。
JBJS Essent Surg Tech. 2025 Aug 15;15(3). doi: 10.2106/JBJS.ST.23.00027. eCollection 2025 Jul-Sep.
5
Short-Term Memory Impairment短期记忆障碍
6
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
7
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
8
Sexual Harassment and Prevention Training性骚扰与预防培训
9
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.预防、检测和管理产后出血的认知和经验:定性证据综合。
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
10
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染鉴定的影响。
Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.

本文引用的文献

1
Aryana-bs: context-aware alignment of bisulfite-sequencing reads.Aryana-bs:亚硫酸氢盐测序读数的上下文感知比对
BMC Bioinformatics. 2025 Jul 21;26(1):188. doi: 10.1186/s12859-025-06182-5.
2
Benchmarking DNA methylation analysis of 14 alignment algorithms for whole genome bisulfite sequencing in mammals.哺乳动物全基因组亚硫酸氢盐测序的14种比对算法的DNA甲基化分析基准测试
Comput Struct Biotechnol J. 2022 Aug 27;20:4704-4716. doi: 10.1016/j.csbj.2022.08.051. eCollection 2022.
3
Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet.
使用双字母字母表对短亚硫酸氢盐测序读数进行快速且内存高效的映射。
NAR Genom Bioinform. 2021 Dec 22;3(4):lqab115. doi: 10.1093/nargab/lqab115. eCollection 2021 Dec.
4
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
5
BiSulfite Bolt: A bisulfite sequencing analysis platform.亚硫酸氢盐测序接头:一种亚硫酸氢盐测序分析平台。
Gigascience. 2021 May 8;10(5). doi: 10.1093/gigascience/giab033.
6
A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases.一种将长读段映射到大型参考数据库的快速近似算法。
J Comput Biol. 2018 Jul;25(7):766-779. doi: 10.1089/cmb.2018.0036. Epub 2018 Apr 30.
7
A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy.一种基于 MinHash 和 kmer 投票的混合云读取对齐器,可保护隐私。
Nat Commun. 2017 May 16;8:15311. doi: 10.1038/ncomms15311.
8
Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation.Canu:通过自适应k-mer加权和重复序列分离实现可扩展且准确的长读长序列拼接
Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116. Epub 2017 Mar 15.
9
DNA Methylation Analysis: Choosing the Right Method.DNA甲基化分析:选择正确的方法。
Biology (Basel). 2016 Jan 6;5(1):3. doi: 10.3390/biology5010003.
10
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.利用单分子测序和局部敏感哈希组装大型基因组。
Nat Biotechnol. 2015 Jun;33(6):623-30. doi: 10.1038/nbt.3238. Epub 2015 May 25.