Wang Jun, Wang Liangjiang
Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29631, USA.
NAR Genom Bioinform. 2020 Feb 7;2(1):lqaa007. doi: 10.1093/nargab/lqaa007. eCollection 2020 Mar.
N-adenosine methylation (mA) is the most abundant internal RNA modification in eukaryotes, and affects RNA metabolism and non-coding RNA function. Previous studies suggest that mA modifications in mammals occur on the consensus sequence DRACH (D = A/G/U, R = A/G, H = A/C/U). However, only about 10% of such adenosines can be mA-methylated, and the underlying sequence determinants are still unclear. Notably, the regulation of mA modifications can be cell-type-specific. In this study, we have developed a deep learning model, called TDm6A, to predict RNA mA modifications in human cells. For cell types with limited availability of mA data, transfer learning may be used to enhance TDm6A model performance. We show that TDm6A can learn common and cell-type-specific motifs, some of which are associated with RNA-binding proteins previously reported to be mA readers or anti-readers. In addition, we have used TDm6A to predict mA sites on human long non-coding RNAs (lncRNAs) for selection of candidates with high levels of mA modifications. The results provide new insights into mA modifications on human protein-coding and non-coding transcripts.
N-腺苷甲基化(mA)是真核生物中最丰富的内部RNA修饰,影响RNA代谢和非编码RNA功能。先前的研究表明,哺乳动物中的mA修饰发生在共有序列DRACH(D = A/G/U,R = A/G,H = A/C/U)上。然而,只有约10%的此类腺苷能够被mA甲基化,其潜在的序列决定因素仍不清楚。值得注意的是,mA修饰的调控具有细胞类型特异性。在本研究中,我们开发了一种名为TDm6A的深度学习模型,用于预测人类细胞中的RNA mA修饰。对于mA数据可用性有限的细胞类型,可以使用迁移学习来提高TDm6A模型的性能。我们表明,TDm6A可以学习常见的和细胞类型特异性的基序,其中一些与先前报道为mA阅读蛋白或反阅读蛋白的RNA结合蛋白相关。此外,我们已使用TDm6A预测人类长链非编码RNA(lncRNA)上的mA位点,以选择具有高水平mA修饰的候选者。这些结果为人类蛋白质编码和非编码转录本上的mA修饰提供了新的见解。