Deep6mA：一个用于探索不同物种中 DNA N6-甲基腺嘌呤位点相似模式的深度学习框架。

Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species.

机构信息

Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China.

Center for Data Science, Zhejiang University, Hangzhou, China.

出版信息

PLoS Comput Biol. 2021 Feb 18;17(2):e1008767. doi: 10.1371/journal.pcbi.1008767. eCollection 2021 Feb.

DOI:10.1371/journal.pcbi.1008767

PMID:33600435

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7924747/

Abstract

N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA's biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.

摘要

N6-甲基腺嘌呤（6mA）是一种与广泛的生物学过程相关的重要 DNA 修饰形式。准确识别基因组范围内的 6mA 位点对于理解 6mA 的生物学功能至关重要。然而，现有的检测 6mA 位点的实验技术成本效益不高，这意味着非常需要开发新的计算方法来解决这个问题。在本文中，我们开发了一种名为 Deep6mA 的深度学习框架，无需事先了解 6mA 并人工制作序列特征，即可识别 DNA 6mA 位点，其性能优于其他 DNA 6mA 预测工具。具体来说，在水稻基准数据集上进行的 5 倍交叉验证，Deep6mA 的灵敏度和特异性分别为 92.96%和 95.06%，整体预测准确率为 94%。重要的是，我们发现具有 6mA 位点的序列在不同物种之间具有相似的模式。使用水稻数据训练的模型可以很好地预测其他三个物种（拟南芥、野草莓和月季）的 6mA 位点，预测准确率超过 90%。此外，我们还发现：（1）6mA 倾向于出现在 GAGG 基序中，这意味着 6mA 位点附近的序列可能具有保守性；（2）6mA 在启动子的 TATA 盒中富集，这可能是其调节下游基因表达的主要来源。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

Deep6mA：一个用于探索不同物种中 DNA N6-甲基腺嘌呤位点相似模式的深度学习框架。

Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

Deep6mA：一个用于探索不同物种中 DNA N6-甲基腺嘌呤位点相似模式的深度学习框架。

Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献