GenoM7GNet：一种基于核苷酸语言模型的高效N-甲基鸟苷位点预测方法。

GenoM7GNet: An Efficient N-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model.

作者信息

Li Chuang, Wang Heshi, Wen Yanhua, Yin Rui, Zeng Xiangxiang, Li Keqin

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2258-2268. doi: 10.1109/TCBB.2024.3459870. Epub 2024 Dec 10.

DOI:10.1109/TCBB.2024.3459870

Abstract

N-methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called "GenoM7GNet," for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.

摘要

N-甲基鸟苷（m7G）是转录后RNA修饰的主流类型之一，在医学治疗中占据极其重要的地位。然而，传统的m7G位点识别方法在时间和设备方面成本都很高。同时，现有的机器学习方法从RNA序列中提取的隐藏信息有限，因此难以提高准确性。因此，我们提出了一种名为“GenoM7GNet”的深度学习网络用于m7G位点识别。该模型利用来自变换器的双向编码器表示（BERT），并在核苷酸序列数据上进行预训练，以捕获RNA序列中的隐藏模式用于m7G位点预测。此外，通过与各种深度学习模型进行详细的对比实验，我们发现一维卷积神经网络（CNN）在序列特征学习和分类方面表现出色。所提出的GenoM7GNet模型在性能评估中的准确率为0.953，灵敏度为0.932，特异性为0.976，马修斯相关系数为0.907，受试者工作特征曲线下面积为0.984。大量实验结果进一步证明，我们的GenoM7GNet模型在预测m7G位点方面明显优于其他现有最先进的模型，展现出较高的计算性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

GenoM7GNet：一种基于核苷酸语言模型的高效N-甲基鸟苷位点预测方法。

GenoM7GNet: An Efficient N-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model.

作者信息

出版信息

相似文献

引用本文的文献

GenoM7GNet：一种基于核苷酸语言模型的高效N-甲基鸟苷位点预测方法。

GenoM7GNet: An Efficient N-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model.

作者信息

出版信息

相似文献

引用本文的文献