Yuge Chelsea Chen, Hang Ee Soon, Mamtha Madasamy Ravi Nadar, Vishwakarma Shashikant, Wang Sijia, Wang Cheng, Le Nguyen Quoc Khanh
NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore.
Independent Researcher, Singapore, Singapore.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae688.
Accurate prediction of RNA modifications holds profound implications for elucidating RNA function and mechanism, with potential applications in drug development. Here, the RNA-ModX presents a highly precise predictive model designed to forecast post-transcriptional RNA modifications, complemented by a user-friendly web application tailored for seamless utilization by future researchers. To achieve exceptional accuracy, the RNA-ModX systematically explored a range of machine learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit, and Transformer-based architectures. The model underwent rigorous testing using a dataset comprising RNA sequences containing the four fundamental nucleotides (A, C, G, U) and spanning 12 prevalent modification classes (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), with sequences of length 1001 nucleotides. Notably, the LSTM model, augmented with 3-mer encoding, demonstrated the highest level of model accuracy. Furthermore, Local Interpretable Model-Agnostic Explanations were employed to facilitate result interpretation, enhancing the transparency and interpretability of the model's predictions. In conjunction with the model development, a user-friendly web application was meticulously crafted, featuring an intuitive interface for researchers to effortlessly upload RNA sequences. Upon submission, the model executes in the backend, generating predictions which are seamlessly presented to the user in a coherent manner. This integration of cutting-edge predictive modeling with a user-centric interface signifies a significant step forward in facilitating the exploration and utilization of RNA modification prediction technologies by the broader research community.
准确预测RNA修饰对于阐明RNA功能和机制具有深远意义,在药物开发中具有潜在应用价值。在此,RNA-ModX提出了一种高度精确的预测模型,旨在预测转录后RNA修饰,并辅以一个用户友好的网络应用程序,以便未来的研究人员能够无缝使用。为了实现卓越的准确性,RNA-ModX系统地探索了一系列机器学习模型,包括长短期记忆(LSTM)、门控循环单元和基于Transformer的架构。该模型使用一个数据集进行了严格测试,该数据集包含含有四种基本核苷酸(A、C、G、U)且跨越12种常见修饰类别(m6A、m1A、m5C、m5U、m6Am、m7G、Ψ、I、Am、Cm、Gm和Um)的RNA序列,序列长度为1001个核苷酸。值得注意的是,采用3聚体编码增强的LSTM模型展现出最高水平的模型准确性。此外,还采用了局部可解释模型无关解释来促进结果解释,提高模型预测的透明度和可解释性。在模型开发的同时,精心打造了一个用户友好的网络应用程序,其具有直观的界面,便于研究人员轻松上传RNA序列。提交后,模型在后端运行,生成预测结果,并以连贯的方式无缝呈现给用户。这种将前沿预测建模与以用户为中心的界面相结合,标志着在促进更广泛的研究群体探索和利用RNA修饰预测技术方面向前迈出了重要一步。