Teng Haotian, Stoiber Marcus, Bar-Joseph Ziv, Kingsford Carl
Computational Biology Department, Carnegie Mellon Univeristy, Pittsburgh PA 15213, USA.
Oxford Nanopore Technologies.
bioRxiv. 2024 Jan 7:2024.01.06.574484. doi: 10.1101/2024.01.06.574484.
Direct nanopore-based RNA sequencing can be used to detect post-transcriptional base modifications, such as m6A methylation, based on the electric current signals produced by the distinct chemical structures of modified bases. A key challenge is the scarcity of adequate training data with known methylation modifications. We present Xron, a hybrid encoder-decoder framework that delivers a direct methylation-distinguishing basecaller by training on synthetic RNA data and immunoprecipitation-based experimental data in two steps. First, we generate data with more diverse modification combinations through in silico cross-linking. Second, we use this dataset to train an end-to-end neural network basecaller followed by fine-tuning on immunoprecipitation-based experimental data with label-smoothing. The trained neural network basecaller outperforms existing methylation detection methods on both read-level and site-level prediction scores. Xron is a standalone, end-to-end m6A-distinguishing basecaller capable of detecting methylated bases directly from raw sequencing signals, enabling de novo methylome assembly.
基于纳米孔的直接RNA测序可用于检测转录后碱基修饰,如m6A甲基化,这是基于修饰碱基独特化学结构产生的电流信号。一个关键挑战是缺乏具有已知甲基化修饰的足够训练数据。我们提出了Xron,这是一种混合编码器-解码器框架,通过分两步对合成RNA数据和基于免疫沉淀的实验数据进行训练,提供了一种直接区分甲基化的碱基识别器。首先,我们通过计算机模拟交联生成具有更多样化修饰组合的数据。其次,我们使用该数据集训练一个端到端神经网络碱基识别器,然后对基于免疫沉淀的实验数据进行标签平滑微调。训练后的神经网络碱基识别器在读取级和位点级预测分数上均优于现有的甲基化检测方法。Xron是一个独立的、端到端的m6A区分碱基识别器,能够直接从原始测序信号中检测甲基化碱基,实现从头甲基化组组装。