State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China.
J Chem Inf Model. 2024 May 27;64(10):4002-4008. doi: 10.1021/acs.jcim.4c00047. Epub 2024 May 9.
Transcription factors (TFs) are important regulatory elements for vital cellular activities, and the identification of transcription factor binding sites (TFBS) can help to explore gene regulatory mechanisms. Research studies have proved that cfDNA (cell-free DNA) shows relatively higher coverage at TFBS due to the protection by TF from degradation by nucleases and short fragments of cfDNA are enriched in TFBS. However, there are still great difficulties in the noninvasive identification of TFBSs from experimental techniques. In this study, we propose a deep learning-based approach that can noninvasively predict TFBSs of cfDNA by learning sequence information from known TFBSs through convolutional neural networks. Under the addition of long short-term memory, our model achieved an area under the curve of 84%. Based on this model to predict cfDNA, we found consistent motifs in cfDNA fragments and lower coverage occurred upstream and downstream of these cfDNA fragments, which is consistent with a previous study. We also found that the binding sites of the same TF differ in different cell lines. TF-specific target genes were detected from cfDNA and were enriched in cancer-related pathways. In summary, our method of locating TFBSs from plasma has the potential to reflect the intrinsic regulatory mechanism from a noninvasive perspective and provide technical guidance for dynamic monitoring of disease in clinical practice.
转录因子(TFs)是细胞生命活动的重要调节因子,鉴定转录因子结合位点(TFBS)有助于探索基因调控机制。研究证明,由于 TF 对核酸酶的保护,cfDNA(游离 DNA)在 TFBS 处的覆盖度相对较高,并且 cfDNA 的短片段在 TFBS 处富集。然而,从实验技术上非侵入性地识别 TFBS 仍然存在很大的困难。在这项研究中,我们提出了一种基于深度学习的方法,可以通过卷积神经网络从已知 TFBS 学习序列信息,从而非侵入性地预测 cfDNA 的 TFBS。在添加长短时记忆的情况下,我们的模型的曲线下面积达到 84%。基于该模型对 cfDNA 的预测,我们发现 cfDNA 片段中存在一致的基序,并且这些 cfDNA 片段上下游的覆盖度较低,这与之前的一项研究一致。我们还发现,相同 TF 的结合位点在不同的细胞系中存在差异。从 cfDNA 中检测到 TF 特异性靶基因,并富集在癌症相关通路中。总之,我们从血浆中定位 TFBS 的方法有可能从非侵入性的角度反映内在的调控机制,并为临床实践中疾病的动态监测提供技术指导。