基于深度学习的转录因子结合预测插补

Imputation for transcription factor binding predictions based on deep learning.

作者信息

Qin Qian, Feng Jianxing

机构信息

Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai, China.

出版信息

PLoS Comput Biol. 2017 Feb 24;13(2):e1005403. doi: 10.1371/journal.pcbi.1005403. eCollection 2017 Feb.

DOI:10.1371/journal.pcbi.1005403

PMID:28234893

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5345877/

Abstract

Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding.

摘要

了解转录因子（TFs）的细胞特异性结合模式是研究生物系统中基因调控网络的基础，在这方面，染色质免疫沉淀测序（ChIP-seq）不仅提供了有价值的数据，而且被视为金标准。尽管科学界付出了巨大努力来开展TF ChIP-seq实验，但考虑到TF和细胞系的所有可能组合，现有的数据仅占ChIP-seq实验的一小部分。在本研究中，我们展示了一种方法，仅使用一小部分（4%）组合的可用ChIP-seq数据，就能准确预测TF-细胞系组合的细胞特异性TF结合。所提出的模型称为TFImpute，它基于一个具有多任务学习设置的深度神经网络，以跨转录因子和细胞系借用信息。与现有方法相比，TFImpute在有ChIP-seq数据的TF-细胞系组合上实现了相当的准确性；此外，TFImpute在没有ChIP-seq数据的TF-细胞系组合上实现了更好的准确性。这种方法可以通过大规模平行报告基因检测预测K562和HepG2细胞系中细胞系特异性增强子活性，并预测单核苷酸多态性（SNP）对TF结合的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67f8/5345877/50cd63c1005c/pcbi.1005403.g001.jpg

相似文献

Imputation for transcription factor binding predictions based on deep learning.

PLoS Comput Biol. 2017 Feb 24;13(2):e1005403. doi: 10.1371/journal.pcbi.1005403. eCollection 2017 Feb.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

TICA: Transcriptional Interaction and Coregulation Analyzer.

Genomics Proteomics Bioinformatics. 2018 Oct;16(5):342-353. doi: 10.1016/j.gpb.2018.05.004. Epub 2018 Dec 19.

MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis.

BMC Bioinformatics. 2015 Dec 24;16:413. doi: 10.1186/s12859-015-0834-3.

Cell-type and transcription factor specific enrichment of transcriptional cofactor motifs in ENCODE ChIP-seq data.

BMC Genomics. 2013;14 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2164-14-S5-S2. Epub 2013 Oct 16.

Revealing transcription factor and histone modification co-localization and dynamics across cell lines by integrating ChIP-seq and RNA-seq data.

BMC Genomics. 2018 Dec 31;19(Suppl 10):914. doi: 10.1186/s12864-018-5278-5.

Cell-type specificity of ChIP-predicted transcription factor binding sites.

BMC Genomics. 2012 Aug 3;13:372. doi: 10.1186/1471-2164-13-372.

Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans.

BMC Genomics. 2023 Oct 7;24(1):597. doi: 10.1186/s12864-023-09692-9.

An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency.

Mol Biosyst. 2017 Aug 22;13(9):1827-1837. doi: 10.1039/c7mb00155j.

Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

Nucleic Acids Res. 2017 Jan 9;45(1):54-66. doi: 10.1093/nar/gkw1061. Epub 2016 Nov 29.

引用本文的文献

Improving plant breeding through AI-supported data integration.

Theor Appl Genet. 2025 Jun 2;138(6):132. doi: 10.1007/s00122-025-04910-2.

The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.

BMC Bioinformatics. 2024 Dec 2;25(1):371. doi: 10.1186/s12859-024-05995-0.

Integrating Prior Knowledge Using Transformer for Gene Regulatory Network Inference.

Adv Sci (Weinh). 2025 Jan;12(3):e2409990. doi: 10.1002/advs.202409990. Epub 2024 Nov 28.

Big data and artificial intelligence-aided crop breeding: Progress and prospects.

J Integr Plant Biol. 2025 Mar;67(3):722-739. doi: 10.1111/jipb.13791. Epub 2024 Oct 28.

Predicting Transcription Factor Binding Sites with Deep Learning.

Int J Mol Sci. 2024 May 3;25(9):4990. doi: 10.3390/ijms25094990.

Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.

Int J Mol Sci. 2023 Nov 1;24(21):15858. doi: 10.3390/ijms242115858.

Deep learning-empowered crop breeding: intelligent, efficient and promising.

Front Plant Sci. 2023 Oct 3;14:1260089. doi: 10.3389/fpls.2023.1260089. eCollection 2023.

Screening for functional regulatory variants in open chromatin using GenIE-ATAC.

Nucleic Acids Res. 2023 Jun 23;51(11):e64. doi: 10.1093/nar/gkad332.

The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles.

Genome Biol. 2023 Apr 18;24(1):79. doi: 10.1186/s13059-023-02915-y.

Evidence for the role of transcription factors in the co-transcriptional regulation of intron retention.

Genome Biol. 2023 Mar 22;24(1):53. doi: 10.1186/s13059-023-02885-1.

本文引用的文献

Cistrome Data Browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse.

Nucleic Acids Res. 2017 Jan 4;45(D1):D658-D662. doi: 10.1093/nar/gkw983. Epub 2016 Oct 26.

gkmSVM: an R package for gapped-kmer SVM.

Bioinformatics. 2016 Jul 15;32(14):2205-7. doi: 10.1093/bioinformatics/btw203. Epub 2016 Apr 19.

High-dimensional genomic data bias correction and data integration using MANCIE.

Nat Commun. 2016 Apr 13;7:11305. doi: 10.1038/ncomms11305.

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding.

Bioinformatics. 2016 Feb 15;32(4):490-6. doi: 10.1093/bioinformatics/btv565. Epub 2015 Oct 17.

Predicting effects of noncoding variants with deep learning-based sequence model.

Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

A method to predict the impact of regulatory variants from DNA sequence.

Nat Genet. 2015 Aug;47(8):955-61. doi: 10.1038/ng.3331. Epub 2015 Jun 15.

Deep learning.

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

Nat Biotechnol. 2015 Apr;33(4):364-76. doi: 10.1038/nbt.3157. Epub 2015 Feb 18.

Identification of altered cis-regulatory elements in human disease.

Trends Genet. 2015 Feb;31(2):67-76. doi: 10.1016/j.tig.2014.12.003. Epub 2015 Jan 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于深度学习的转录因子结合预测插补

Imputation for transcription factor binding predictions based on deep learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献