Wang Yupeng, Jaime-Lara Rosario B, Roy Abhrarup, Sun Ying, Liu Xinyue, Joseph Paule V
BDX Research and Consulting LLC, Herndon, VA, 20171, USA.
Division of Intramural Research, National Institute of Nursing Research, National Institutes of Health, Bethesda, MD, 20892, USA.
BMC Res Notes. 2021 Mar 19;14(1):104. doi: 10.1186/s13104-021-05518-7.
To address the challenge of computational identification of cell type-specific regulatory elements on a genome-wide scale.
We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of "strong enhancer" chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL .
应对在全基因组范围内通过计算识别细胞类型特异性调控元件的挑战。
我们提出了SeqEnhDL,这是一个基于序列特征对细胞类型特异性增强子进行分类的深度学习框架。从ENCODE项目的九种细胞类型中检索“强增强子”染色质状态的DNA序列,以构建和测试增强子分类器。对于任何DNA序列,相对于每个核苷酸位置上随机选择的非编码序列的位置k-mer(k = 5、7、9和11)倍数变化被用作深度学习模型的特征。实现了三种深度学习模型,包括多层感知器(MLP)、卷积神经网络(CNN)和循环神经网络(RNN)。SeqEnhDL中的所有模型在将细胞类型特异性增强子与随机选择的非编码序列区分开来方面均优于现有最先进的增强子分类器(包括gkm-SVM和DanQ)。此外,SeqEnhDL可以直接区分不同细胞类型的增强子,这是其他增强子分类器尚未实现的。我们的分析表明,基于增强子的序列特征可以准确识别增强子及其组织特异性。SeqEnhDL可在https://github.com/wyp1125/SeqEnhDL上公开获取。