Qin Qian, Feng Jianxing
Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai, China.
PLoS Comput Biol. 2017 Feb 24;13(2):e1005403. doi: 10.1371/journal.pcbi.1005403. eCollection 2017 Feb.
Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding.
了解转录因子(TFs)的细胞特异性结合模式是研究生物系统中基因调控网络的基础,在这方面,染色质免疫沉淀测序(ChIP-seq)不仅提供了有价值的数据,而且被视为金标准。尽管科学界付出了巨大努力来开展TF ChIP-seq实验,但考虑到TF和细胞系的所有可能组合,现有的数据仅占ChIP-seq实验的一小部分。在本研究中,我们展示了一种方法,仅使用一小部分(4%)组合的可用ChIP-seq数据,就能准确预测TF-细胞系组合的细胞特异性TF结合。所提出的模型称为TFImpute,它基于一个具有多任务学习设置的深度神经网络,以跨转录因子和细胞系借用信息。与现有方法相比,TFImpute在有ChIP-seq数据的TF-细胞系组合上实现了相当的准确性;此外,TFImpute在没有ChIP-seq数据的TF-细胞系组合上实现了更好的准确性。这种方法可以通过大规模平行报告基因检测预测K562和HepG2细胞系中细胞系特异性增强子活性,并预测单核苷酸多态性(SNP)对TF结合的影响。