Suppr超能文献

综合分析基因组、功能和蛋白质相互作用数据可预测长程增强子-靶基因相互作用。

Integrative analysis of genomic, functional and protein interaction data predicts long-range enhancer-target gene interactions.

机构信息

Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Max Planck Institute for Molecular Genetics, Berlin, Germany.

出版信息

Nucleic Acids Res. 2011 Apr;39(7):2492-502. doi: 10.1093/nar/gkq1081. Epub 2010 Nov 24.

Abstract

Multicellular organismal development is controlled by a complex network of transcription factors, promoters and enhancers. Although reliable computational and experimental methods exist for enhancer detection, prediction of their target genes remains a major challenge. On the basis of available literature and ChIP-seq and ChIP-chip data for enhanceosome factor p300 and the transcriptional regulator Gli3, we found that genomic proximity and conserved synteny predict target genes with a relatively low recall of 12-27% within 2 Mb intervals centered at the enhancers. Here, we show that functional similarities between enhancer binding proteins and their transcriptional targets and proximity in the protein-protein interactome improve prediction of target genes. We used all four features to train random forest classifiers that predict target genes with a recall of 58% in 2 Mb intervals that may contain dozens of genes, representing a better than two-fold improvement over the performance of prediction based on single features alone. Genome-wide ChIP data is still relatively poorly understood, and it remains difficult to assign biological significance to binding events. Our study represents a first step in integrating various genomic features in order to elucidate the genomic network of long-range regulatory interactions.

摘要

多细胞生物的发育受转录因子、启动子和增强子组成的复杂网络所控制。尽管已经存在可靠的计算和实验方法来检测增强子,但预测其靶基因仍然是一个主要挑战。基于现有的文献以及增强子因子 p300 和转录调节剂 Gli3 的 ChIP-seq 和 ChIP-chip 数据,我们发现基因组的临近性和保守的同线性可以预测靶基因,在以增强子为中心的 2Mb 间隔内的召回率相对较低,为 12-27%。在这里,我们表明,增强子结合蛋白与其转录靶基因之间的功能相似性以及在蛋白质-蛋白质相互作用组中的临近性可以提高靶基因的预测。我们使用所有四个特征来训练随机森林分类器,该分类器可以在可能包含数十个基因的 2Mb 间隔内以 58%的召回率预测靶基因,这比仅基于单个特征的预测性能提高了两倍以上。全基因组 ChIP 数据仍然相对难以理解,并且仍然难以将结合事件赋予生物学意义。我们的研究代表了整合各种基因组特征以阐明长程调控相互作用的基因组网络的第一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd9c/3074119/11b80ccebb91/gkq1081f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验