Suppr超能文献

SeqEnhDL:使用深度学习模型对细胞类型特异性增强子进行基于序列的分类

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models.

作者信息

Wang Yupeng, Jaime-Lara Rosario B, Roy Abhrarup, Sun Ying, Liu Xinyue, Joseph Paule V

机构信息

BDX Research and Consulting LLC, Herndon, VA, 20171, USA.

Division of Intramural Research, National Institute of Nursing Research, National Institutes of Health, Bethesda, MD, 20892, USA.

出版信息

BMC Res Notes. 2021 Mar 19;14(1):104. doi: 10.1186/s13104-021-05518-7.

Abstract

OBJECTIVE

To address the challenge of computational identification of cell type-specific regulatory elements on a genome-wide scale.

RESULTS

We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of "strong enhancer" chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL .

摘要

目的

应对在全基因组范围内通过计算识别细胞类型特异性调控元件的挑战。

结果

我们提出了SeqEnhDL,这是一个基于序列特征对细胞类型特异性增强子进行分类的深度学习框架。从ENCODE项目的九种细胞类型中检索“强增强子”染色质状态的DNA序列,以构建和测试增强子分类器。对于任何DNA序列,相对于每个核苷酸位置上随机选择的非编码序列的位置k-mer(k = 5、7、9和11)倍数变化被用作深度学习模型的特征。实现了三种深度学习模型,包括多层感知器(MLP)、卷积神经网络(CNN)和循环神经网络(RNN)。SeqEnhDL中的所有模型在将细胞类型特异性增强子与随机选择的非编码序列区分开来方面均优于现有最先进的增强子分类器(包括gkm-SVM和DanQ)。此外,SeqEnhDL可以直接区分不同细胞类型的增强子,这是其他增强子分类器尚未实现的。我们的分析表明,基于增强子的序列特征可以准确识别增强子及其组织特异性。SeqEnhDL可在https://github.com/wyp1125/SeqEnhDL上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f46d/7980595/c3044c5ea7e0/13104_2021_5518_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验