• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

KPop:通过序列嵌入对微生物基因组进行准确且可扩展的比较分析。

KPop: accurate and scalable comparative analysis of microbial genomes by sequence embeddings.

作者信息

Didelot Xavier, Ribeca Paolo

机构信息

School of Life Sciences and Department of Statistics, University of Warwick, Coventry, UK.

NIHR Health Protection Research Unit in Genomics and Enabling Data, University of Warwick, Coventry, UK.

出版信息

Genome Biol. 2025 Jun 18;26(1):170. doi: 10.1186/s13059-025-03585-8.

DOI:10.1186/s13059-025-03585-8
PMID:40533801
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12175428/
Abstract

Here we introduce KPop, a novel versatile method based on full k-mer spectra and dataset-specific transformations, through which thousands of assembled or unassembled microbial genomes can be quickly compared. Unlike MinHash-based methods that produce distances and have lower resolution, KPop is able to accurately map sequences onto a low-dimensional space. Extensive validation on simulated and real-life viral and bacterial datasets shows that KPop can correctly separate sequences at both species and sub-species levels even when the overall genomic diversity is low. KPop also rapidly identifies related sequences and systematically outperforms MinHash-based methods.

摘要

在此,我们介绍KPop,这是一种基于完整k-mer谱和特定数据集转换的新型通用方法,通过该方法可以快速比较数千个已组装或未组装的微生物基因组。与基于MinHash的方法不同,后者产生距离且分辨率较低,KPop能够将序列准确地映射到低维空间。对模拟和真实病毒及细菌数据集的广泛验证表明,即使总体基因组多样性较低,KPop也能在物种和亚种水平上正确分离序列。KPop还能快速识别相关序列,并在系统性能上优于基于MinHash的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/9dadc77ce7f8/13059_2025_3585_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/75d4fa59ee86/13059_2025_3585_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/40654c9b6117/13059_2025_3585_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/e5c8965f4a49/13059_2025_3585_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/7cc78a792f2b/13059_2025_3585_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/a743e306bbbe/13059_2025_3585_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/f3996607cf58/13059_2025_3585_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/6280755d51fb/13059_2025_3585_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/490c9cd86258/13059_2025_3585_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/9dadc77ce7f8/13059_2025_3585_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/75d4fa59ee86/13059_2025_3585_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/40654c9b6117/13059_2025_3585_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/e5c8965f4a49/13059_2025_3585_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/7cc78a792f2b/13059_2025_3585_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/a743e306bbbe/13059_2025_3585_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/f3996607cf58/13059_2025_3585_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/6280755d51fb/13059_2025_3585_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/490c9cd86258/13059_2025_3585_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/9dadc77ce7f8/13059_2025_3585_Fig9_HTML.jpg

相似文献

1
KPop: accurate and scalable comparative analysis of microbial genomes by sequence embeddings.KPop:通过序列嵌入对微生物基因组进行准确且可扩展的比较分析。
Genome Biol. 2025 Jun 18;26(1):170. doi: 10.1186/s13059-025-03585-8.
2
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.
3
ScITree: Scalable Bayesian inference of transmission tree from epidemiological and genomic data.ScITree:从流行病学和基因组数据中对传播树进行可扩展的贝叶斯推断。
PLoS Comput Biol. 2025 Jun 10;21(6):e1012657. doi: 10.1371/journal.pcbi.1012657. eCollection 2025 Jun.
4
Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.社区对土壤传播蠕虫群体药物给药的看法:定性证据综合分析
Cochrane Database Syst Rev. 2025 Jun 20;6:CD015794. doi: 10.1002/14651858.CD015794.pub2.
5
The rise and global spread of IMP carbapenemases (1996-2023): a genomic epidemiology study.IMP碳青霉烯酶的兴起与全球传播(1996 - 2023年):一项基因组流行病学研究
medRxiv. 2025 May 26:2025.05.25.25328332. doi: 10.1101/2025.05.25.25328332.
6
PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences.PRCFX-DT:一种基于图形的基因组序列特征选择与分类新方法。
BMC Bioinformatics. 2025 Jun 17;26(1):159. doi: 10.1186/s12859-025-06183-4.
7
Aural toilet (ear cleaning) for chronic suppurative otitis media.慢性化脓性中耳炎的耳道清理(耳部清洁)
Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013057. doi: 10.1002/14651858.CD013057.pub3.
8
Interventions for fertility preservation in women with cancer undergoing chemotherapy.对接受化疗的癌症女性进行生育力保存的干预措施。
Cochrane Database Syst Rev. 2025 Jun 19;6:CD012891. doi: 10.1002/14651858.CD012891.pub2.
9
Interventions to reduce non-prescription antimicrobial sales in community pharmacies.减少社区药房非处方抗菌药物销售的干预措施。
Cochrane Database Syst Rev. 2025 Jan 29;1(1):CD013722. doi: 10.1002/14651858.CD013722.pub2.
10
Genomic description of sp. nov., a bacterium collected from the International Space Station that exhibits unique antimicrobial-resistant and virulent phenotype.从国际空间站收集的一种细菌的基因组描述,该细菌表现出独特的抗微生物耐药性和毒力表型。
mSystems. 2025 Jun 17;10(6):e0053725. doi: 10.1128/msystems.00537-25. Epub 2025 May 20.

引用本文的文献

1
Solanum bulbocastanum nucleotide-binding leucine-rich repeat receptor evolution reveals functional variants and critical residues in Rpi-blb1/RB.马铃薯核苷酸结合富含亮氨酸重复序列受体的进化揭示了Rpi-blb1/RB中的功能变体和关键残基。
J Integr Plant Biol. 2025 Sep;67(9):2491-2509. doi: 10.1111/jipb.13950. Epub 2025 Jun 17.

本文引用的文献

1
Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation.极小值是极小值的推广,能够实现无偏的局部杰卡德估计。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad512.
2
Parameterized syncmer schemes improve long-read mapping.参数化同步mers 方案提高了长读测序数据的比对效率。
PLoS Comput Biol. 2022 Oct 28;18(10):e1010638. doi: 10.1371/journal.pcbi.1010638. eCollection 2022 Oct.
3
Mashtree: a rapid comparison of whole genome sequence files.Mashtree:全基因组序列文件的快速比较
J Open Source Softw. 2019 Dec 10;4(44). doi: 10.21105/joss.01762.
4
The minimizer Jaccard estimator is biased and inconsistent.最小化 Jaccard 估计量有偏且不一致。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i169-i176. doi: 10.1093/bioinformatics/btac244.
5
A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments.深度暹罗神经网络提高了不同环境中微生物组数据集的宏基因组组装基因组。
Nat Commun. 2022 Apr 28;13(1):2326. doi: 10.1038/s41467-022-29843-y.
6
Exploring bacterial diversity via a curated and searchable snapshot of archived DNA sequences.通过对存档DNA序列的精心整理和可搜索快照探索细菌多样性。
PLoS Biol. 2021 Nov 9;19(11):e3001421. doi: 10.1371/journal.pbio.3001421. eCollection 2021 Nov.
7
Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool.使用穿山甲工具对新出现的大流行中的流行病学谱系进行分类。
Virus Evol. 2021 Jul 30;7(2):veab064. doi: 10.1093/ve/veab064. eCollection 2021.
8
Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic.超快现有树木样本放置 (UShER) 可实现 SARS-CoV-2 大流行的实时系统发生学。
Nat Genet. 2021 Jun;53(6):809-816. doi: 10.1038/s41588-021-00862-7. Epub 2021 May 10.
9
Correspondence analysis, spectral clustering and graph embedding: applications to ecology and economic complexity.对应分析、谱聚类和图嵌入:在生态学和经济复杂性中的应用。
Sci Rep. 2021 Apr 26;11(1):8926. doi: 10.1038/s41598-021-87971-9.
10
Accurate reconstruction of bacterial pan- and core genomes with PEPPAN.使用 PEPPAN 进行细菌全基因组和核心基因组的精确重建。
Genome Res. 2020 Nov;30(11):1667-1679. doi: 10.1101/gr.260828.120. Epub 2020 Oct 14.