ReQuant：通过k-mer值插补改进碱基修饰识别

ReQuant: improved base modification calling by k-mer value imputation.

作者信息

Straver Roy, Vermeulen Carlo, Verity-Legg Joe R, Pagès-Gallego Marc, Stoker Dieter G G, van Oudenaarden Alexander, de Ridder Jeroen

机构信息

Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands.

Oncode Institute, 3521 AL Utrecht, The Netherlands.

出版信息

Nucleic Acids Res. 2025 May 10;53(9). doi: 10.1093/nar/gkaf323.

DOI:10.1093/nar/gkaf323

PMID:40347136

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12065109/

Abstract

Nanopore sequencing allows identification of base modifications, such as methylation, directly from raw current data. Prevailing approaches, including deep learning (DL) methods, require training data covering all possible sequence contexts. These data can be prohibitively expensive or impossible to obtain for some modifications. Hence, research into DNA modifications focuses on the most prevalent modification in human DNA: 5mC in a CpG context. Improved generalization is required to reach the technology's full potential: calling any modification from raw current values. We developed ReQuant, an algorithm to impute full, k-mer based, modification models from limited k-mer context training data. ReQuant is highly accurate for calling modifications (CpG/GpC methylation and CpG glucosylation) in Lambda Phage R9 data when fitting on ≤25% of all possible 6-mers with a modification and extends to human R10 data. The success of our approach shows that DNA modifications have a consistent and therefore predictable effect on Nanopore current levels, suggesting that interpretable rule-based imputation in unseen contexts is possible. Our approach circumvents the need for modification-specific DL tools and enables modification calling when not all sequence contexts can be obtained, opening a vast field of biological base modification research.

摘要

纳米孔测序能够直接从原始电流数据中识别碱基修饰，例如甲基化。包括深度学习（DL）方法在内的主流方法需要涵盖所有可能序列上下文的训练数据。对于某些修饰而言，这些数据可能成本过高或无法获取。因此，DNA修饰的研究聚焦于人类DNA中最常见的修饰：CpG背景下的5mC。要充分发挥该技术的潜力，即从原始电流值中识别任何修饰，需要提高泛化能力。我们开发了ReQuant算法，该算法可从有限的k-mer上下文训练数据中估算完整的、基于k-mer的修饰模型。当在所有可能的带有修饰的6-mer中≤25%的数据上进行拟合时，ReQuant在Lambda噬菌体R9数据中对修饰（CpG/GpC甲基化和CpG糖基化）的识别非常准确，并且可以扩展到人类R10数据。我们方法的成功表明，DNA修饰对纳米孔电流水平具有一致且因此可预测的影响，这表明在未见过的上下文中进行基于规则的可解释估算成为可能。我们的方法避免了对特定修饰的DL工具的需求，并在无法获得所有序列上下文时实现修饰识别，从而开启了生物碱基修饰研究的广阔领域。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ReQuant：通过k-mer值插补改进碱基修饰识别

ReQuant: improved base modification calling by k-mer value imputation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

ReQuant：通过k-mer值插补改进碱基修饰识别

ReQuant: improved base modification calling by k-mer value imputation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献