Suppr超能文献

4mCPred-CNN-使用卷积神经网络预测小鼠基因组中的 DNA N4-甲基胞嘧啶。

4mCPred-CNN-Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network.

机构信息

Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea.

Institute of Avionics and Aeronautics (IAA), Air University, Islamabad 44000, Pakistan.

出版信息

Genes (Basel). 2021 Feb 20;12(2):296. doi: 10.3390/genes12020296.

Abstract

Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme-one-hot encoding-we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics.

摘要

在 DNA 修饰中,N4-甲基胞嘧啶(4mC)是最重要的修饰之一,它与细胞增殖和基因表达的发展有关。为了了解其不同的生物学功能,需要准确检测 4mC 位点。尽管我们已经有了几种基于机器学习(ML)和卷积神经网络(CNNs)的不同基因组中 4mC 位点预测技术,但还没有用于识别小鼠基因组中 4mC 位点的基于 CNN 的工具。在本文中,开发了一种名为 4mCPred-CNN 的基于 CNN 的模型,用于对小鼠基因组中的 4mC 位置进行分类。到目前为止,我们只有两个基于 ML 的模型用于此目的;它们利用了几种特征编码方案,因此仍然有很大的空间可以提高预测准确性。仅使用一种特征编码方案——独热编码,我们的表现优于之前的两种基于 ML 的技术。在十折验证测试中,所提出的模型 4mCPred-CNN 达到了 85.71%的准确率和 0.717 的马修斯相关系数(MCC)。在一个独立的数据集上,达到的准确率为 87.50%,MCC 值为 0.750。所获得的结果表明,该模型可以为生物学和生物信息学领域的研究人员提供很大的帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3abf/7924022/c70e12a2d356/genes-12-00296-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验