State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China.
Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China.
Plant Physiol. 2024 May 31;195(2):1200-1213. doi: 10.1093/plphys/kiae120.
N 6-methyladenosine (m6A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m6A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m6A-Seq data analysis for predicting m6A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning-based m6A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m6A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m6A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m6A prediction. Overall, this study presents a promising m6A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available at https://github.com/cma2015/PEA-m6A.
N6-甲基腺苷(m6A)是真核生物 mRNA 中最普遍的修饰,参与基因表达调控和许多 RNA 代谢过程。准确预测 m6A 修饰对于理解其在不同生物背景下的分子机制非常重要。然而,大多数现有的模型应用范围有限,且以物种为中心。在这里,我们提出了 PEA-m6A,这是一个统一的、模块化和参数化的框架,可以简化 m6A-Seq 数据分析,用于预测植物基因组中 m6A 修饰的区域。PEA-m6A 框架构建了基于集成学习的 m6A 预测模型,使用基于统计和深度学习驱动的特征,在 12 种植物物种中,与最先进的区域尺度 m6A 预测器 WeakRM 相比,在精度-召回曲线下面积方面的性能提高了 6.7%至 23.3%。特别是,PEA-m6A 能够通过迁移学习利用来自预训练模型的知识,这是一个创新,它可以提高小样本训练任务中 m6A 修饰的预测准确性。PEA-m6A 还具有很强的泛化能力,适用于内物种和跨物种的 m6A 预测。总的来说,这项研究提出了一个很有前途的 m6A 预测工具 PEA-m6A,它在准确性、灵活性、可转移性和泛化能力方面表现出色。PEA-m6A 已经使用 Galaxy 和 Docker 技术进行了打包,便于使用,并在 https://github.com/cma2015/PEA-m6A 上公开发布。