Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Tongji University, Shanghai, China.
Computer Science and Technology, Tongji University, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa435.
Transcription factors (TFs) play an important role in regulating gene expression, thus identification of the regions bound by them has become a fundamental step for molecular and cellular biology. In recent years, an increasing number of deep learning (DL) based methods have been proposed for predicting TF binding sites (TFBSs) and achieved impressive prediction performance. However, these methods mainly focus on predicting the sequence specificity of TF-DNA binding, which is equivalent to a sequence-level binary classification task, and fail to identify motifs and TFBSs accurately. In this paper, we developed a fully convolutional network coupled with global average pooling (FCNA), which by contrast is equivalent to a nucleotide-level binary classification task, to roughly locate TFBSs and accurately identify motifs. Experimental results on human ChIP-seq datasets show that FCNA outperforms other competing methods significantly. Besides, we find that the regions located by FCNA can be used by motif discovery tools to further refine the prediction performance. Furthermore, we observe that FCNA can accurately identify TF-DNA binding motifs across different cell lines and infer indirect TF-DNA bindings.
转录因子 (TFs) 在调节基因表达中起着重要作用,因此鉴定它们结合的区域已成为分子和细胞生物学的基本步骤。近年来,越来越多的基于深度学习 (DL) 的方法被提出用于预测 TF 结合位点 (TFBSs),并取得了令人印象深刻的预测性能。然而,这些方法主要侧重于预测 TF-DNA 结合的序列特异性,这相当于序列级别的二进制分类任务,并且无法准确识别基序和 TFBSs。在本文中,我们开发了一种完全卷积网络与全局平均池化 (FCNA) 相结合的方法,它相当于核苷酸级别的二进制分类任务,用于粗略定位 TFBSs 和准确识别基序。在人类 ChIP-seq 数据集上的实验结果表明,FCNA 明显优于其他竞争方法。此外,我们发现 FCNA 定位的区域可以被基序发现工具用于进一步改进预测性能。此外,我们观察到 FCNA 可以准确识别不同细胞系中的 TF-DNA 结合基序,并推断间接的 TF-DNA 结合。