DNA 序列通过利用深度学习算法进行自然语言处理，用于识别 N4-甲基胞嘧啶。

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

机构信息

Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea.

School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.

出版信息

Sci Rep. 2021 Jan 8;11(1):212. doi: 10.1038/s41598-020-80430-x.

DOI:10.1038/s41598-020-80430-x

PMID:33420191

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7794489/

Abstract

N4-methylcytosine is a biochemical alteration of DNA that affects the genetic operations without modifying the DNA nucleotides such as gene expression, genomic imprinting, chromosome stability, and the development of the cell. In the proposed work, a computational model, 4mCNLP-Deep, used the word embedding approach as a vector formulation by exploiting deep learning based CNN algorithm to predict 4mC and non-4mC sites on the C.elegans genome dataset. Diversity of ranges employed for the experimental such as corpus k-mer and k-fold cross-validation to obtain the prevailing capabilities. The 4mCNLP-Deep outperform from the state-of-the-art predictor by achieving the results in five evaluation metrics by following; Accuracy (ACC) as 0.9354, Mathew's correlation coefficient (MCC) as 0.8608, Specificity (Sp) as 0.89.96, Sensitivity (Sn) as 0.9563, and Area under curve (AUC) as 0.9731 by using 3-mer corpus word2vec and 3-fold cross-validation and attained the increment of 1.1%, 0.6%, 0.58%, 0.77%, and 4.89%, respectively. At last, we developed the online webserver http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/ , for the experimental researchers to get the results easily.

摘要

N4-甲基胞嘧啶是一种影响遗传操作的 DNA 生化改变，它不会改变 DNA 核苷酸，如基因表达、基因组印记、染色体稳定性和细胞发育。在提出的工作中，一个名为 4mCNLP-Deep 的计算模型使用了词嵌入方法作为向量公式，利用基于深度学习的 CNN 算法来预测 C.elegans 基因组数据集上的 4mC 和非 4mC 位点。实验中使用了多种范围，如语料库 k-mer 和 k 折交叉验证，以获得流行的能力。4mCNLP-Deep 通过在以下五个评估指标中取得的结果优于最先进的预测器：准确率 (ACC) 为 0.9354、马修斯相关系数 (MCC) 为 0.8608、特异性 (Sp) 为 0.89.96、敏感性 (Sn) 为 0.9563 和曲线下面积 (AUC) 为 0.9731，使用 3-mer 语料库 word2vec 和 3 折交叉验证，分别增加了 1.1%、0.6%、0.58%、0.77%和 4.89%。最后，我们开发了在线网络服务器 http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/ ，供实验研究人员方便地获取结果。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

DNA 序列通过利用深度学习算法进行自然语言处理，用于识别 N4-甲基胞嘧啶。

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

DNA 序列通过利用深度学习算法进行自然语言处理，用于识别 N4-甲基胞嘧啶。

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献