Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
Oxford Nanopore Technologies, Alameda, California 94501-1170, USA.
Genome Res. 2024 Nov 20;34(11):1987-1999. doi: 10.1101/gr.278960.124.
Direct nanopore-based RNA sequencing can be used to detect posttranscriptional base modifications, such as N6-methyladenosine (m6A) methylation, based on the electric current signals produced by the distinct chemical structures of modified bases. A key challenge is the scarcity of adequate training data with known methylation modifications. We present Xron, a hybrid encoder-decoder framework that delivers a direct methylation-distinguishing basecaller by training on synthetic RNA data and immunoprecipitation (IP)-based experimental data in two steps. First, we generate data with more diverse modification combinations through in silico cross-linking. Second, we use this data set to train an end-to-end neural network basecaller followed by fine-tuning on IP-based experimental data with label smoothing. The trained neural network basecaller outperforms existing methylation detection methods on both read-level and site-level prediction scores. Xron is a standalone, end-to-end m6A-distinguishing basecaller capable of detecting methylated bases directly from raw sequencing signals, enabling de novo methylome assembly.
基于直接纳米孔的 RNA 测序可以根据修饰碱基的独特化学结构产生的电流信号,检测转录后碱基修饰,如 N6-甲基腺苷(m6A)甲基化。一个关键的挑战是缺乏具有已知甲基化修饰的充足训练数据。我们提出了 Xron,这是一种混合编码器-解码器框架,通过在两步中对合成 RNA 数据和免疫沉淀 (IP) 实验数据进行训练,提供了一种直接的甲基化区分碱基调用器。首先,我们通过计算机模拟交联生成了具有更多不同修饰组合的数据。其次,我们使用这个数据集来训练端到端的神经网络碱基调用器,然后在带有标签平滑的 IP 实验数据上进行微调。训练好的神经网络碱基调用器在读取水平和位点水平的预测得分上都优于现有的甲基化检测方法。Xron 是一个独立的、端到端的 m6A 区分碱基调用器,能够直接从原始测序信号中检测甲基化碱基,从而实现从头甲基组组装。