Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA.
Nat Methods. 2019 Sep;16(9):858-861. doi: 10.1038/s41592-019-0511-y. Epub 2019 Aug 12.
The decoding of transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF labels into the same space. By training on binding data from hundreds of TFs and embedding over 1 M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance, in vitro and in vivo, and can distinguish between signals of closely related TFs.
转录因子(TF)结合信号在基因组 DNA 中的解码是一个基本问题。在这里,我们提出了一个名为 BindSpace 的预测模型,它可以学习将 DNA 序列和 TF 标签嵌入到同一个空间中。通过对来自数百个 TF 的结合数据进行训练,并嵌入超过 100 万个 DNA 序列,BindSpace 在体外和体内实现了最先进的多类结合预测性能,并且可以区分密切相关的 TF 的信号。