Suppr超能文献

KPop:通过序列嵌入对微生物基因组进行准确且可扩展的比较分析。

KPop: accurate and scalable comparative analysis of microbial genomes by sequence embeddings.

作者信息

Didelot Xavier, Ribeca Paolo

机构信息

School of Life Sciences and Department of Statistics, University of Warwick, Coventry, UK.

NIHR Health Protection Research Unit in Genomics and Enabling Data, University of Warwick, Coventry, UK.

出版信息

Genome Biol. 2025 Jun 18;26(1):170. doi: 10.1186/s13059-025-03585-8.

Abstract

Here we introduce KPop, a novel versatile method based on full k-mer spectra and dataset-specific transformations, through which thousands of assembled or unassembled microbial genomes can be quickly compared. Unlike MinHash-based methods that produce distances and have lower resolution, KPop is able to accurately map sequences onto a low-dimensional space. Extensive validation on simulated and real-life viral and bacterial datasets shows that KPop can correctly separate sequences at both species and sub-species levels even when the overall genomic diversity is low. KPop also rapidly identifies related sequences and systematically outperforms MinHash-based methods.

摘要

在此,我们介绍KPop,这是一种基于完整k-mer谱和特定数据集转换的新型通用方法,通过该方法可以快速比较数千个已组装或未组装的微生物基因组。与基于MinHash的方法不同,后者产生距离且分辨率较低,KPop能够将序列准确地映射到低维空间。对模拟和真实病毒及细菌数据集的广泛验证表明,即使总体基因组多样性较低,KPop也能在物种和亚种水平上正确分离序列。KPop还能快速识别相关序列,并在系统性能上优于基于MinHash的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad4/12175428/75d4fa59ee86/13059_2025_3585_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验