Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.
Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA.
Genome Res. 2020 Mar;30(3):459-471. doi: 10.1101/gr.259655.119. Epub 2020 Feb 14.
A high-confidence map of the direct, functional targets of each transcription factor (TF) requires convergent evidence from independent sources. Two significant sources of evidence are TF binding locations and the transcriptional responses to direct TF perturbations. Systematic data sets of both types exist for yeast and human, but they rarely converge on a common set of direct, functional targets for a TF. Even the few genes that are both bound and responsive may not be direct functional targets. Our analysis shows that when there are many nonfunctional binding sites and many indirect targets, nonfunctional sites are expected to occur in the -regulatory DNA of indirect targets by chance. To address this problem, we introduce dual threshold optimization (DTO), a new method for setting significance thresholds on binding and perturbation-response data, and show that it improves convergence. It also enables comparison of binding data to perturbation-response data that have been processed by network inference algorithms, which further improves convergence. The combination of dual threshold optimization and network inference greatly expands the high-confidence TF network map in both yeast and human. Next, we analyze a comprehensive new data set measuring the transcriptional response shortly after inducing overexpression of a yeast TF. We also present a new yeast binding location data set obtained by transposon calling cards and compare it to recent ChIP-exo data. These new data sets improve convergence and expand the high-confidence network synergistically.
一个高可信度的转录因子(TF)直接功能靶点图谱需要来自独立来源的证据汇聚。两种重要的证据来源是 TF 结合位置和直接 TF 扰动的转录响应。酵母和人类都有系统的这两种类型的数据集,但它们很少在 TF 的共同直接功能靶点上汇聚。即使是那些既结合又有反应的少数基因也可能不是直接的功能靶点。我们的分析表明,当存在许多非功能结合位点和许多间接靶点时,非功能位点可能会偶然出现在间接靶点的 -调控 DNA 中。为了解决这个问题,我们引入了双重阈值优化(DTO),这是一种在结合和扰动反应数据上设置显著性阈值的新方法,并表明它可以提高汇聚度。它还可以比较结合数据和通过网络推断算法处理的扰动反应数据,从而进一步提高汇聚度。双重阈值优化和网络推断的结合极大地扩展了酵母和人类中高可信度的 TF 网络图谱。接下来,我们分析了一个新的综合数据集,该数据集在酵母 TF 过表达后不久测量了转录反应。我们还介绍了一个新的酵母结合位置数据集,该数据集是通过转座子调用卡片获得的,并将其与最近的 ChIP-exo 数据进行了比较。这些新数据集协同作用,提高了汇聚度并扩展了高可信度网络。