School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.
Bioinformatics. 2020 Dec 30;36(Suppl_2):i634-i642. doi: 10.1093/bioinformatics/btaa789.
Transcription factor (TF) DNA-binding is a central mechanism in gene regulation. Biologists would like to know where and when these factors bind DNA. Hence, they require accurate DNA-binding models to enable binding prediction to any DNA sequence. Recent technological advancements measure the binding of a single TF to thousands of DNA sequences. One of the prevailing techniques, high-throughput SELEX, measures protein-DNA binding by high-throughput sequencing over several cycles of enrichment. Unfortunately, current computational methods to infer the binding preferences from high-throughput SELEX data do not exploit the richness of these data, and are under-using the most advanced computational technique, deep neural networks.
To better characterize the binding preferences of TFs from these experimental data, we developed DeepSELEX, a new algorithm to infer intrinsic DNA-binding preferences using deep neural networks. DeepSELEX takes advantage of the richness of high-throughput sequencing data and learns the DNA-binding preferences by observing the changes in DNA sequences through the experimental cycles. DeepSELEX outperforms extant methods for the task of DNA-binding inference from high-throughput SELEX data in binding prediction in vitro and is on par with the state of the art in in vivo binding prediction. Analysis of model parameters reveals it learns biologically relevant features that shed light on TFs' binding mechanism.
DeepSELEX is available through github.com/OrensteinLab/DeepSELEX/.
Supplementary data are available at Bioinformatics online.
转录因子 (TF) 的 DNA 结合是基因调控的核心机制。生物学家希望了解这些因子在何处以及何时与 DNA 结合。因此,他们需要准确的 DNA 结合模型来实现对任何 DNA 序列的结合预测。最近的技术进步可以测量单个 TF 与数千个 DNA 序列的结合。其中一种流行的技术,高通量 SELEX,通过多个富集循环的高通量测序来测量蛋白质-DNA 结合。不幸的是,从高通量 SELEX 数据推断结合偏好的当前计算方法没有利用这些数据的丰富性,并且没有充分利用最先进的计算技术,即深度神经网络。
为了更好地从这些实验数据中描述 TF 的结合偏好,我们开发了 DeepSELEX,这是一种使用深度神经网络推断内在 DNA 结合偏好的新算法。DeepSELEX 利用高通量测序数据的丰富性,通过观察实验循环中 DNA 序列的变化来学习 DNA 结合偏好。在从高通量 SELEX 数据推断 DNA 结合的任务中,DeepSELEX 在体外结合预测方面优于现有方法,并且与体内结合预测的最新技术水平相当。对模型参数的分析表明,它学习了生物学上相关的特征,这些特征揭示了 TF 结合机制。
DeepSELEX 可在 github.com/OrensteinLab/DeepSELEX/ 上获得。
补充数据可在 Bioinformatics 在线获得。