Yi Mengyue, Zhou Fenglin, Deng Yu
School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China.
Front Genet. 2024 May 30;15:1408688. doi: 10.3389/fgene.2024.1408688. eCollection 2024.
N4-acetylcysteine (ac4C) is a chemical modification in mRNAs that alters the structure and function of mRNA by adding an acetyl group to the N4 position of cytosine. Researchers have shown that ac4C is closely associated with the occurrence and development of various cancers. Therefore, accurate prediction of ac4C modification sites on human mRNA is crucial for revealing its role in diseases and developing new diagnostic and therapeutic strategies. However, existing deep learning models still have limitations in prediction accuracy and generalization ability, which restrict their effectiveness in handling complex biological sequence data. This paper introduces a deep learning-based model, STM-ac4C, for predicting ac4C modification sites on human mRNA. The model combines the advantages of selective kernel convolution, temporal convolutional networks, and multi-head self-attention mechanisms to effectively extract and integrate multi-level features of RNA sequences, thereby achieving high-precision prediction of ac4C sites. On the independent test dataset, STM-ac4C showed improvements of 1.81%, 3.5%, and 0.37% in accuracy, Matthews correlation coefficient, and area under the curve, respectively, compared to the existing state-of-the-art technologies. Moreover, its performance on additional balanced and imbalanced datasets also confirmed the model's robustness and generalization ability. Various experimental results indicate that STM-ac4C outperforms existing methods in predictive performance. In summary, STM-ac4C excels in predicting ac4C modification sites on human mRNA, providing a powerful new tool for a deeper understanding of the biological significance of mRNA modifications and cancer treatment. Additionally, the model reveals key sequence features that influence the prediction of ac4C sites through sequence region impact analysis, offering new perspectives for future research. The source code and experimental data are available at https://github.com/ymy12341/STM-ac4C.
N4-乙酰半胱氨酸(ac4C)是mRNA中的一种化学修饰,它通过在胞嘧啶的N4位置添加一个乙酰基来改变mRNA的结构和功能。研究人员表明,ac4C与各种癌症的发生和发展密切相关。因此,准确预测人类mRNA上的ac4C修饰位点对于揭示其在疾病中的作用以及开发新的诊断和治疗策略至关重要。然而,现有的深度学习模型在预测准确性和泛化能力方面仍然存在局限性,这限制了它们在处理复杂生物序列数据时的有效性。本文介绍了一种基于深度学习的模型STM-ac4C,用于预测人类mRNA上的ac4C修饰位点。该模型结合了选择性内核卷积、时间卷积网络和多头自注意力机制的优点,有效地提取和整合了RNA序列的多级特征,从而实现了对ac4C位点的高精度预测。在独立测试数据集上,与现有的最先进技术相比,STM-ac4C在准确率、马修斯相关系数和曲线下面积方面分别提高了1.81%、3.5%和0.37%。此外,它在额外的平衡和不平衡数据集上的表现也证实了该模型的稳健性和泛化能力。各种实验结果表明,STM-ac4C在预测性能上优于现有方法。总之,STM-ac4C在预测人类mRNA上的ac4C修饰位点方面表现出色,为更深入理解mRNA修饰的生物学意义和癌症治疗提供了一个强大的新工具。此外,该模型通过序列区域影响分析揭示了影响ac4C位点预测的关键序列特征,为未来的研究提供了新的视角。源代码和实验数据可在https://github.com/ymy12341/STM-ac4C获取。