Suppr超能文献

DCNN-4mC:基于密集连接神经网络的多物种N4-甲基胞嘧啶位点预测

DCNN-4mC: Densely connected neural network based N4-methylcytosine site prediction in multiple species.

作者信息

Rehman Mobeen Ur, Tayara Hilal, Chong Kil To

机构信息

Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.

Department of Avionics Engineering, Air University, Islamabad 44000, Pakistan.

出版信息

Comput Struct Biotechnol J. 2021 Nov 1;19:6009-6019. doi: 10.1016/j.csbj.2021.10.034. eCollection 2021.

Abstract

DNA N4-methylcytosine (4mC) being a significant genetic modification holds a dominant role in controlling different biological functions, i.e., DNA replication, DNA repair, gene regulations and gene expression levels. The identification of 4mC sites is important to get insight information regarding different organics mechanisms. However, getting modification prediction from experimental methods is a challenging task due to high expenses and time-consuming techniques. Therefore, computational tools can be a great option for modification identification. Various computational tools are proposed in literature but their generalization and prediction performance require improvement. For this motive, we have proposed a neural network based tool named DCNN-4mC for identifying 4mC sites. The proposed model involves a set of neural network layers with a skip connection which allows to share the shallow features with dense layers. Skip connection have allowed to gather crucial information regarding 4mC sites. In literature, different models are employed on different species hence in many cases different datasets are available for a single species. In this research, we have combined all available datasets to create a single benchmark dataset for every species. To the best of our knowledge, no model in literature is employed on more than six different species. To ensure the generalizability of DCNN-4mC we have used 12 different species for performance evaluation. The DCNN-4mC tool has attained 2% to 14% higher accuracy than state-of-the-art tools on all available datasets of different species. Furthermore, independent test datasets are also engaged and DCNN-4mC have overall yielded high performance in them as well.

摘要

DNA N4-甲基胞嘧啶(4mC)作为一种重要的基因修饰,在控制不同生物学功能(即DNA复制、DNA修复、基因调控和基因表达水平)中起着主导作用。识别4mC位点对于深入了解不同的有机机制很重要。然而,由于实验方法成本高且技术耗时,通过实验方法进行修饰预测是一项具有挑战性的任务。因此,计算工具可能是修饰识别的一个很好的选择。文献中提出了各种计算工具,但其泛化能力和预测性能仍需改进。出于这个目的,我们提出了一种基于神经网络的工具DCNN-4mC来识别4mC位点。所提出的模型包含一组带有跳跃连接的神经网络层,这允许与密集层共享浅层特征。跳跃连接有助于收集有关4mC位点的关键信息。在文献中,不同的模型应用于不同的物种,因此在许多情况下,单个物种有不同的数据集。在本研究中,我们将所有可用数据集合并,为每个物种创建了一个单一的基准数据集。据我们所知,文献中没有模型应用于超过六个不同的物种。为确保DCNN-4mC的泛化能力,我们使用了12个不同的物种进行性能评估。在不同物种的所有可用数据集上,DCNN-4mC工具比现有工具的准确率高出2%至14%。此外,我们还使用了独立测试数据集,DCNN-4mC在这些数据集中也总体表现出高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/627a/8605313/0649b5d07f0e/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验