• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用LexicMap与数百万个原核生物基因组进行高效序列比对。

Efficient sequence alignment against millions of prokaryotic genomes with LexicMap.

作者信息

Shen Wei, Lees John A, Iqbal Zamin

机构信息

Department of Infectious Diseases, Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, China.

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.

出版信息

Nat Biotechnol. 2025 Sep 10. doi: 10.1038/s41587-025-02812-8.

DOI:10.1038/s41587-025-02812-8
PMID:40931109
Abstract

The size of microbial sequence databases continues to grow beyond the abilities of existing alignment tools. We introduce LexicMap, a nucleotide sequence alignment tool for efficiently querying moderate-length sequences (>250 bp) such as a gene, plasmid or long read against up to millions of prokaryotic genomes. We construct a small set of probe k-mers, which are selected to efficiently sample the entire database to be indexed such that every 250-bp window of each database genome contains multiple seed k-mers, each with a shared prefix with one of the probes. Storing these seeds in a hierarchical index enables fast and low-memory alignment. We benchmark both accuracy and potential to scale to databases of millions of bacterial genomes, showing that LexicMap achieves comparable accuracy to state-of-the-art methods but with greater speed and lower memory use. Our method supports querying at scale and within minutes, which will be useful for many biological applications across epidemiology, ecology and evolution.

摘要

微生物序列数据库的规模持续增长,超出了现有比对工具的能力范围。我们推出了LexicMap,这是一种核苷酸序列比对工具,用于高效查询中等长度序列(>250 bp),例如基因、质粒或长读段,可与多达数百万个原核生物基因组进行比对。我们构建了一小套探针k-mer,这些探针经过挑选,能够有效地对整个待索引数据库进行采样,使得每个数据库基因组的每250 bp窗口都包含多个种子k-mer,每个种子k-mer都与其中一个探针有共享前缀。将这些种子存储在分层索引中可实现快速且低内存的比对。我们对准确性和扩展到数百万个细菌基因组数据库的潜力进行了基准测试,结果表明LexicMap与最先进的方法具有相当的准确性,但速度更快且内存使用更低。我们的方法支持大规模且在数分钟内进行查询,这将对流行病学、生态学和进化等众多生物学应用有用。

相似文献

1
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap.使用LexicMap与数百万个原核生物基因组进行高效序列比对。
Nat Biotechnol. 2025 Sep 10. doi: 10.1038/s41587-025-02812-8.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
4
Anterior Approach Total Ankle Arthroplasty with Patient-Specific Cut Guides.使用患者特异性截骨导向器的前路全踝关节置换术。
JBJS Essent Surg Tech. 2025 Aug 15;15(3). doi: 10.2106/JBJS.ST.23.00027. eCollection 2025 Jul-Sep.
5
Clinical symptoms, signs and tests for identification of impending and current water-loss dehydration in older people.老年人即将发生和当前失水脱水的识别的临床症状、体征及检查
Cochrane Database Syst Rev. 2015 Apr 30;2015(4):CD009647. doi: 10.1002/14651858.CD009647.pub2.
6
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
7
Short-Term Memory Impairment短期记忆障碍
8
Acupuncture for neonatal abstinence syndrome in newborn infants.针刺疗法用于新生儿戒断综合征的治疗
Cochrane Database Syst Rev. 2025 Feb 21;2(2):CD014160. doi: 10.1002/14651858.CD014160.pub2.
9
Factors that impact on the use of mechanical ventilation weaning protocols in critically ill adults and children: a qualitative evidence-synthesis.影响重症成人和儿童机械通气撤机方案使用的因素:一项定性证据综合分析
Cochrane Database Syst Rev. 2016 Oct 4;10(10):CD011812. doi: 10.1002/14651858.CD011812.pub2.
10
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

本文引用的文献

1
Efficient and robust search of microbial genomes via phylogenetic compression.通过系统发育压缩对微生物基因组进行高效且稳健的搜索。
Nat Methods. 2025 Apr;22(4):692-697. doi: 10.1038/s41592-025-02625-2. Epub 2025 Apr 9.
2
BWT construction and search at the terabase scale.万亿碱基规模下的BWT构建与搜索。
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae717.
3
k-nonical space: sketching with reverse complements.k-典范空间:使用互补序列进行草图绘制。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae629.
4
BSAlign: A Library for Nucleotide Sequence Alignment.BSAlign:一个核苷酸序列比对库。
Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae025.
5
SeqKit2: A Swiss army knife for sequence and alignment processing.SeqKit2:一款用于序列和比对处理的瑞士军刀式工具。
Imeta. 2024 Apr 5;3(3):e191. doi: 10.1002/imt2.191. eCollection 2024 Jun.
6
Immune interface interference vaccines: An evolution-informed approach to anti-bacterial vaccine design.免疫界面干扰疫苗:一种基于进化信息的抗细菌疫苗设计方法。
Microb Biotechnol. 2024 Mar;17(3):e14446. doi: 10.1111/1751-7915.14446.
7
Exact global alignment using A* with chaining seed heuristic and match pruning.使用带有链接种子启发式方法和匹配剪枝的A*算法进行精确全局比对。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae032.
8
RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.RefSeq 与宏基因组时代的原核生物基因组注释流程。
Nucleic Acids Res. 2024 Jan 5;52(D1):D762-D769. doi: 10.1093/nar/gkad988.
9
SPIRE: a Searchable, Planetary-scale mIcrobiome REsource.SPIRE:一个可搜索的、行星规模的微生物组资源。
Nucleic Acids Res. 2024 Jan 5;52(D1):D777-D783. doi: 10.1093/nar/gkad943.
10
GenBank 2024 Update.GenBank 2024 更新。
Nucleic Acids Res. 2024 Jan 5;52(D1):D134-D137. doi: 10.1093/nar/gkad903.