IEEE Trans Biomed Eng. 2022 Aug;69(8):2557-2568. doi: 10.1109/TBME.2022.3150420. Epub 2022 Jul 18.
The m6A modification is the most common ribonucleic acid (RNA) modification, playing a role in prompting the virus's gene mutation and protein structure changes in the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Nanopore single-molecule direct RNA sequencing (DRS) provides data support for RNA modification detection, which can preserve the potential mA signature compared to second-generation sequencing. However, due to insufficient DRS data, there is a lack of methods to find m6A RNA modifications in DRS. Our purpose is to identify mA modifications in DRS precisely.
We present a methodology for identifying mA modifications that incorporated mapping and extracted features from DRS data. To detect mA modifications, we introduce an ensemble method called mixed-weight neural bagging (MWNB), trained with 5-base RNA synthetic DRS containing modified and unmodified mA.
Our MWNB model achieved the highest classification accuracy of 97.85% and AUC of 0.9968. Additionally, we applied the MWNB model to the COVID-19 dataset; the experiment results reveal a strong association with biomedical experiments.
Our strategy enables the prediction of mA modifications using DRS data and completes the identification of mA modifications on the SARS-CoV-2.
The Corona Virus Disease 2019 (COVID-19) outbreak has significantly influence, caused by the SARS-CoV-2. An RNA modification called mA is connected with viral infections. The appearance of mA modifications related to several essential proteins affects proteins' structure and function. Therefore, finding the location and number of mA RNA modifications is crucial for subsequent analysis of the protein expression profile.
m6A 修饰是最常见的 RNA 修饰,在促使病毒基因发生突变和蛋白质结构改变方面发挥作用。纳米孔单分子直接 RNA 测序(DRS)为 RNA 修饰检测提供数据支持,与第二代测序相比,它可以保留潜在的 mA 特征。然而,由于 DRS 数据不足,缺乏在 DRS 中发现 m6A RNA 修饰的方法。我们的目的是准确识别 DRS 中的 mA 修饰。
我们提出了一种从 DRS 数据中映射和提取特征来识别 mA 修饰的方法。为了检测 mA 修饰,我们引入了一种称为混合权重神经袋(MWNB)的集成方法,该方法使用包含修饰和未修饰 mA 的 5 碱基 RNA 合成 DRS 进行训练。
我们的 MWNB 模型实现了最高的分类准确率 97.85%和 AUC 为 0.9968。此外,我们将 MWNB 模型应用于 COVID-19 数据集;实验结果显示与生物医学实验有很强的关联性。
我们的策略能够使用 DRS 数据预测 mA 修饰,并完成对 SARS-CoV-2 上 mA 修饰的识别。
2019 年冠状病毒病(COVID-19)的爆发是由严重急性呼吸系统综合征冠状病毒 2(SARS-CoV-2)引起的。一种称为 mA 的 RNA 修饰与病毒感染有关。与几种重要蛋白质相关的 mA 修饰的出现会影响蛋白质的结构和功能。因此,确定 mA RNA 修饰的位置和数量对于后续分析蛋白质表达谱至关重要。