Suppr超能文献

在DREAM2挑战中推断转录因子的直接调控靶点。

Inferring direct regulatory targets of a transcription factor in the DREAM2 challenge.

作者信息

Vega Vinsensius B, Woo Xing Yi, Hamidi Habib, Yeo Hock Chuan, Yeo Zhen Xuan, Bourque Guillaume, Clarke Neil D

机构信息

Genome Institute of Singapore, Singapore.

出版信息

Ann N Y Acad Sci. 2009 Mar;1158:215-23. doi: 10.1111/j.1749-6632.2008.03759.x.

Abstract

In the DREAM2 community-wide experiment on regulatory network inference, one of the challenges was to identify which genes, in a list of 200, are direct regulatory targets of the transcription factor BCL6. The organizers of the challenge defined targets based on gene expression and chromatin immunoprecipitation experiments (ChIP-chip). The expression data were publicly available; the ChIP-chip data were not. In order to assess the likelihood that a gene is a BCL6 target, we used three classes of information: expression-level differences, over-representation of sequence motifs in promoter regions, and gene ontology annotations. A weight was attached to each analysis based on how well it identified BCL6-bound genes as defined by publicly available ChIP-chip data. By the organizers' criteria, our group, GenomeSingapore, performed best. However, our retrospective analysis indicates that this success was dominated by a gene expression analysis that was predicated on a regulatory model known to be favored by the organizers. We also noted that the 200-gene test set was enriched only in genes that are upregulated, while genes bound by BCL6 are enriched in both upregulated and downregulated genes. Together, these observations suggest possible model biases in the selection of the gold-standard gene set and imply that our success was attained in part by adhering to the same assumptions. We argue that model biases of this type are unavoidable in the inference of regulatory networks and, for that reason, we suggest that future community-wide experiments of this type should focus on the prediction of data, rather than models.

摘要

在DREAM2全社区范围的调控网络推断实验中,其中一项挑战是从200个基因的列表中识别哪些基因是转录因子BCL6的直接调控靶点。该挑战的组织者基于基因表达和染色质免疫沉淀实验(芯片免疫沉淀,ChIP-chip)来定义靶点。表达数据是公开可用的;而芯片免疫沉淀数据并非如此。为了评估一个基因是BCL6靶点的可能性,我们使用了三类信息:表达水平差异、启动子区域序列基序的过度富集以及基因本体注释。根据每项分析识别公开可用芯片免疫沉淀数据所定义的BCL6结合基因的能力,为每项分析赋予一个权重。按照组织者的标准,我们基因组新加坡团队表现最佳。然而,我们的回顾性分析表明,这一成功主要归功于一项基于组织者所青睐的调控模型的基因表达分析。我们还注意到,200个基因的测试集仅在表达上调的基因中富集,而BCL6结合的基因在上调基因和下调基因中均有富集。这些观察结果共同表明,在金标准基因集的选择中可能存在模型偏差,并意味着我们的成功部分是通过遵循相同的假设而实现的。我们认为,这种类型的模型偏差在调控网络推断中是不可避免的,因此,我们建议未来此类全社区范围的实验应侧重于数据预测,而非模型预测。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验