Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
Nucleic Acids Res. 2021 Apr 19;49(7):3719-3734. doi: 10.1093/nar/gkab124.
N6-methyladenosine (m6A) is the most pervasive modification in eukaryotic mRNAs. Numerous biological processes are regulated by this critical post-transcriptional mark, such as gene expression, RNA stability, RNA structure and translation. Recently, various experimental techniques and computational methods have been developed to characterize the transcriptome-wide landscapes of m6A modification for understanding its underlying mechanisms and functions in mRNA regulation. However, the experimental techniques are generally costly and time-consuming, while the existing computational models are usually designed only for m6A site prediction in a single-species and have significant limitations in accuracy, interpretability and generalizability. Here, we propose a highly interpretable computational framework, called MASS, based on a multi-task curriculum learning strategy to capture m6A features across multiple species simultaneously. Extensive computational experiments demonstrate the superior performances of MASS when compared to the state-of-the-art prediction methods. Furthermore, the contextual sequence features of m6A captured by MASS can be explained by the known critical binding motifs of the related RNA-binding proteins, which also help elucidate the similarity and difference among m6A features across species. In addition, based on the predicted m6A profiles, we further delineate the relationships between m6A and various properties of gene regulation, including gene expression, RNA stability, translation, RNA structure and histone modification. In summary, MASS may serve as a useful tool for characterizing m6A modification and studying its regulatory code. The source code of MASS can be downloaded from https://github.com/mlcb-thu/MASS.
N6-甲基腺苷(m6A)是真核生物 mRNA 中最普遍的修饰。这种关键的转录后标记调节着众多的生物过程,如基因表达、RNA 稳定性、RNA 结构和翻译。最近,已经开发出各种实验技术和计算方法来描绘 m6A 修饰的全转录组图谱,以了解其在 mRNA 调控中的潜在机制和功能。然而,实验技术通常成本高昂且耗时,而现有的计算模型通常仅针对单一物种的 m6A 位点预测设计,在准确性、可解释性和通用性方面存在显著的局限性。在这里,我们提出了一个高度可解释的计算框架,称为 MASS,它基于多任务课程学习策略,同时捕捉多个物种的 m6A 特征。广泛的计算实验表明,与最先进的预测方法相比,MASS 具有优越的性能。此外,MASS 捕获的 m6A 上下文序列特征可以通过相关 RNA 结合蛋白的已知关键结合基序来解释,这也有助于阐明不同物种之间 m6A 特征的相似性和差异。此外,基于预测的 m6A 图谱,我们进一步描绘了 m6A 与基因调控的各种特性之间的关系,包括基因表达、RNA 稳定性、翻译、RNA 结构和组蛋白修饰。总之,MASS 可以作为一种有用的工具,用于描绘 m6A 修饰并研究其调控密码。MASS 的源代码可以从 https://github.com/mlcb-thu/MASS 下载。