Suppr超能文献

NBCR-ac4C:一种基于多元 BERT 的深度学习框架,用于人类 mRNA N4-乙酰胞嘧啶位点预测。

NBCR-ac4C: A Deep Learning Framework Based on Multivariate BERT for Human mRNA N4-Acetylcytidine Sites Prediction.

机构信息

School of Artificial Intelligence, Hebei University of Technology, Tianjin 300400, China.

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.

出版信息

J Chem Inf Model. 2024 Oct 28;64(20):8074-8081. doi: 10.1021/acs.jcim.4c01415. Epub 2024 Oct 5.

Abstract

N4-acetylcytidine (ac4C) plays a crucial role in regulating cellular biological processes, particularly in gene expression regulation and disease development. However, experiments to identify ac4C in a wet lab are time-consuming and costly, and the learning-based methods struggle to capture the underlying semantic knowledge and relations within sequences. To address this, we propose a deep learning approach called NBCR-ac4C based on pretrained models. Specifically, we employ Nucleotide Transformer and DNABERT2 to construct contextual embedding of nucleotide sequences, which effectively mine and express context relations between different features in the sequence. Convolutional neural network (CNN) and ResNet18 are then applied to further extract shallow and deep knowledge from context embedding. Depending on extensive experiments for the prediction of ac4C sites in nucleotide sequences, we observe that NBCR-ac4C outperforms general learning-based models. It achieves the highest accuracy (ACC) of 83.51% and an Area Under the Receiver Operating Characteristic Curve (AUROC) of 89.58% on an independent test set. Moreover, the proposed model, compared to the current state-of-the-art (SOTA) model LSA-ac4C, demonstrates higher ACC and AUROC by 0.81-3.7% and 0.05-1.58%, respectively. The data set and code are available on https://github.com/2103374200/NBCR to facilitate further discussion on NBCR-ac4C.

摘要

N4-乙酰胞苷(ac4C)在调节细胞生物过程中起着至关重要的作用,特别是在基因表达调控和疾病发展中。然而,在实验室中进行 ac4C 的实验既耗时又昂贵,基于学习的方法难以捕捉序列中潜在的语义知识和关系。为了解决这个问题,我们提出了一种基于预训练模型的深度学习方法,称为 NBCR-ac4C。具体来说,我们使用核苷酸转换器和 DNA-BERT2 来构建核苷酸序列的上下文嵌入,有效地挖掘和表达序列中不同特征之间的上下文关系。然后,卷积神经网络(CNN)和 ResNet18 被应用于从上下文嵌入中进一步提取浅层和深层知识。通过对核苷酸序列中 ac4C 位点的预测进行广泛的实验,我们观察到 NBCR-ac4C 优于一般的基于学习的模型。它在独立测试集上实现了 83.51%的最高准确率(ACC)和 89.58%的接收器操作特征曲线下面积(AUROC)。此外,与当前最先进的(SOTA)模型 LSA-ac4C 相比,该模型的 ACC 和 AUROC 分别提高了 0.81-3.7%和 0.05-1.58%。数据集和代码可在 https://github.com/2103374200/NBCR 上获得,以促进对 NBCR-ac4C 的进一步讨论。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验