Zhang Yonglin, Mo Qi, Xue Li, Luo Jiesi
Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China.
School of Public Health, Southwest Medical University, Luzhou 646000, China.
Genomics. 2021 Nov;113(6):3774-3781. doi: 10.1016/j.ygeno.2021.09.009. Epub 2021 Sep 14.
As a key component of gene regulation, transcription factors (TFs) play an important role in a number of biological processes. To fully understand the underlying mechanism of TF-mediated gene regulation, it is therefore critical to accurately identify TF binding sites and predict their affinities. Recently, deep learning (DL) algorithms have achieved promising results in the prediction of DNA-TF binding, however, various deep learning architectures have not been systematically compared, and the relative merit of each architecture remains unclear. To address this problem, we applied four different deep learning architectures to SELEX-seq and HT-SELEX data, covering three species and 35 families. We evaluated and compared the performance of different deep neural models using 10-fold cross-validation. Our results indicate that the hybrid CNN + DNN model shows the best performances. We expect that our study will be broadly applicable to modeling and predicting TF binding specificity when more high-throughput affinity data are available.
作为基因调控的关键组成部分,转录因子(TFs)在许多生物过程中发挥着重要作用。因此,为了全面了解TF介导的基因调控的潜在机制,准确识别TF结合位点并预测其亲和力至关重要。最近,深度学习(DL)算法在DNA-TF结合预测方面取得了有前景的结果,然而,各种深度学习架构尚未得到系统比较,每种架构的相对优点仍不明确。为了解决这个问题,我们将四种不同的深度学习架构应用于SELEX-seq和HT-SELEX数据,涵盖三个物种和35个家族。我们使用10折交叉验证评估并比较了不同深度神经模型的性能。我们的结果表明,混合CNN + DNN模型表现最佳。我们预计,当有更多高通量亲和力数据可用时,我们的研究将广泛适用于建模和预测TF结合特异性。