Department of Physics, School of Science, Tianjin University, Tianjin 300072, China.
Department of Physics, School of Science, Tianjin University, Tianjin 300072, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China.
Int J Biol Macromol. 2023 Dec 31;253(Pt 3):126837. doi: 10.1016/j.ijbiomac.2023.126837. Epub 2023 Sep 13.
N4-acetylcytidine (ac4C) is a vital constituent of the epitranscriptome and plays a crucial role in the regulation of mRNA expression. Numerous studies have established correlations between ac4C and the incidence, progression and prognosis of various cancers. Therefore, accurately predicting ac4C sites is an important step towards comprehending the biological functions of this modification and devising effective therapeutic interventions. Wet experiments are primary methods for studying ac4C, but computational methods have emerged as a promising supplement due to their cost-effectiveness and shorter research cycles. However, current models still have inherent limitations in terms of predictive performance and generalization ability. Here, we utilized automated machine learning technology to establish a reliable baseline and constructed a deep hybrid neural network, LSA-ac4C, which combines double-layer Long Short-Term Memory (LSTM) and self-attention mechanism for accurate ac4C sites prediction. Benchmarking comparisons demonstrate that LSA-ac4C exhibits superior performance compared to the current state-of-the-art method, with ACC, MCC and AUROC improving by 2.89 %, 5.96 % and 1.53 %, respectively, on an independent test set. Overall, LSA-ac4C serves as a powerful tool for predicting ac4C sites in human mRNA, thus benefiting research on RNA modification. For the convenience of the research community, a web server has been established at http://tubic.org/ac4C.
N4-乙酰胞苷(ac4C)是转录组的重要组成部分,在调节 mRNA 表达方面发挥着关键作用。大量研究已经建立了 ac4C 与各种癌症的发病率、进展和预后之间的相关性。因此,准确预测 ac4C 位点是理解这种修饰的生物学功能和设计有效治疗干预措施的重要步骤。湿实验是研究 ac4C 的主要方法,但由于成本效益高和研究周期短,计算方法已成为一种有前途的补充方法。然而,当前的模型在预测性能和泛化能力方面仍然存在固有局限性。在这里,我们利用自动化机器学习技术建立了一个可靠的基线,并构建了一个深度混合神经网络 LSA-ac4C,它结合了双层长短期记忆(LSTM)和自注意力机制,用于准确预测 ac4C 位点。基准比较表明,LSA-ac4C 与当前最先进的方法相比具有更好的性能,在独立测试集上的 ACC、MCC 和 AUROC 分别提高了 2.89%、5.96%和 1.53%。总的来说,LSA-ac4C 是一种预测人类 mRNA 中 ac4C 位点的强大工具,从而有利于 RNA 修饰的研究。为了方便研究界的使用,我们在 http://tubic.org/ac4C 建立了一个网络服务器。