通过KMA的邻近评分对纳米孔和Illumina数据进行无组装分型。

Assembly-free typing of Nanopore and Illumina data through proximity scoring with KMA.

作者信息

Clausen Philip T L C, Hallgren Malte B, Overballe-Petersen Søren, Marcelino Vanessa R, Hasman Henrik, Aarestrup Frank M

机构信息

Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.

Department of Bacteria, Parasites, and Fungi, Statens Serum Institut, 2300 Copenhagen, Denmark.

出版信息

NAR Genom Bioinform. 2025 Sep 1;7(3):lqaf116. doi: 10.1093/nargab/lqaf116. eCollection 2025 Sep.

DOI:10.1093/nargab/lqaf116

PMID:40918069

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12408904/

Abstract

Advances in Oxford Nanopore Technologies (ONT) with the introduction of the r10.4.1 flow cell have reduced the sequencing error rates to <1%. When a reference sequence is known, this allows for accurate variant calling comparable with what is known from the second-generation short-read sequencing technologies, such as Illumina. Additionally, the longer sequence reads provided by ONT enable more efficient mappings, which means the amount of multimapping reads is reduced. However, when the correct reference is not known in advance, and the target reference is highly similar to other references, the multimapping problem is still a concern. Although the algorithm has provided an accurate solution to the multimapping problem of the second-generation short-read sequencing technologies, it is less effective when resolving the multimapping problems arising from third-generation long-read sequencing technologies. To overcome this problem, we are introducing proximity scoring of alleles, which aids the algorithm to accurately assign specific alleles from databases containing loci with a high degree of redundancy. Using multilocus sequence typing as a test case, we show that this approach matches the results obtained from sequencing data of Illumina while using limited computational resources that essentially correspond to that of today's smartphones.

摘要

随着r10.4.1流动槽的引入，牛津纳米孔技术（ONT）取得了进展，将测序错误率降低到了<1%。当已知参考序列时，这使得准确的变异检测成为可能，可与第二代短读长测序技术（如Illumina）相媲美。此外，ONT提供的更长序列读段能够实现更高效的比对，这意味着多重比对读段的数量减少了。然而，当预先不知道正确的参考序列，且目标参考序列与其他参考序列高度相似时，多重比对问题仍然令人担忧。尽管该算法为第二代短读长测序技术的多重比对问题提供了准确的解决方案，但在解决第三代长读长测序技术产生的多重比对问题时效果较差。为了克服这个问题，我们引入了等位基因的邻近评分，这有助于该算法从包含高度冗余位点的数据库中准确分配特定的等位基因。以多位点序列分型作为测试案例，我们表明这种方法在使用本质上相当于当今智能手机的有限计算资源时，与从Illumina测序数据获得的结果相匹配。