Gohl Patrick, Oliva Baldo
Department of Medicine and Life Sciences, SBI-GRIB, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain.
BMC Bioinformatics. 2025 Mar 10;26(1):81. doi: 10.1186/s12859-025-06094-4.
Mutations in non-coding regulatory regions of DNA may lead to disease through the disruption of transcription factor binding. However, our understanding of binding patterns of transcription factors and the effects that changes to their binding sites have on their action remains limited. To address this issue we trained a Deep learning model to predict the effects of Single Nucleotide Polymorphisms (SNP) on transcription factor binding. Allele specific binding (ASB) data from Chromatin Immunoprecipitation sequencing (ChIP-seq) experiments were paired with high sequence-identity DNA binding Domains assessed in Protein Binding Microarray (PBM) experiments. For each transcription factor a paired DNA binding Domain was selected from which we derived E-score profiles for reference and alternate DNA sequences of ASB events. A Convolutional Neural Network (CNN) was trained to predict whether these profiles were indicative of ASB gain/loss or no change in binding. 18211 E-score profiles from 113 transcription factors were split into train, validation and test data. We compared the performance of the trained model with other available platforms for predicting the effect of SNP on transcription factor binding. Our model demonstrated increased accuracy and ASB recall in comparison to the best scoring benchmark tools.
In this paper we present our model SNPeBoT (Single Nucleotide Polymorphism effect on Binding of Transcription Factors) in its standalone and web server form. The increased recovery and prediction accuracy of allele specific binding events could prove useful in discovering non-coding mutations relevant to disease.
DNA非编码调控区域的突变可能通过破坏转录因子结合而导致疾病。然而,我们对转录因子结合模式以及其结合位点变化对其作用的影响的理解仍然有限。为了解决这个问题,我们训练了一个深度学习模型来预测单核苷酸多态性(SNP)对转录因子结合的影响。来自染色质免疫沉淀测序(ChIP-seq)实验的等位基因特异性结合(ASB)数据与在蛋白质结合微阵列(PBM)实验中评估的高序列同一性DNA结合结构域配对。对于每个转录因子,选择一个配对的DNA结合结构域,从中我们得出ASB事件的参考和替代DNA序列的E值谱。训练了一个卷积神经网络(CNN)来预测这些谱是否表明ASB增加/减少或结合无变化。来自113个转录因子的18211个E值谱被分为训练、验证和测试数据。我们将训练模型的性能与其他可用平台预测SNP对转录因子结合影响的性能进行了比较。与得分最高的基准工具相比,我们的模型显示出更高的准确性和ASB召回率。
在本文中,我们以独立形式和网络服务器形式展示了我们的模型SNPeBoT(单核苷酸多态性对转录因子结合的影响)。等位基因特异性结合事件的回收率和预测准确性的提高可能有助于发现与疾病相关的非编码突变。