Wang Yong, Zhang Xiang-Sun, Xia Yu
Bioinformatics Program, Department of Chemistry, Boston University, Boston, MA 02215, USA.
Nucleic Acids Res. 2009 Oct;37(18):5943-58. doi: 10.1093/nar/gkp625. Epub 2009 Aug 6.
Transcriptional cooperativity among several transcription factors (TFs) is believed to be the main mechanism of complexity and precision in transcriptional regulatory programs. Here, we present a Bayesian network framework to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. We design a Bayesian network structure to capture the dominant correlations among features and TF cooperativity, and introduce a supervised learning framework with a well-constructed gold-standard dataset. This framework allows us to assess the predictive power of each genomic feature, validate the superior performance of our Bayesian network compared to alternative methods, and integrate genomic features for optimal TF cooperativity prediction. Data integration reveals 159 high-confidence predicted cooperative relationships among 105 TFs, most of which are subsequently validated by literature search. The existing and predicted transcriptional cooperativities can be grouped into three categories based on the combination patterns of the genomic features, providing further biological insights into the different types of TF cooperativity. Our methodology is the first supervised learning approach for predicting transcriptional cooperativity, compares favorably to alternative unsupervised methodologies, and can be applied to other genomic data integration tasks where high-quality gold-standard positive data are scarce.
几种转录因子(TFs)之间的转录协同作用被认为是转录调控程序中复杂性和精确性的主要机制。在此,我们提出了一个贝叶斯网络框架,通过整合15个基因组特征的综合列表,重建酿酒酵母转录协同作用的高可信度全基因组图谱。我们设计了一种贝叶斯网络结构来捕捉特征与TF协同作用之间的主要相关性,并引入了一个带有精心构建的金标准数据集的监督学习框架。该框架使我们能够评估每个基因组特征的预测能力,验证我们的贝叶斯网络相对于其他方法的优越性能,并整合基因组特征以实现最佳的TF协同作用预测。数据整合揭示了105个TF之间的159个高可信度预测协同关系,其中大部分随后通过文献检索得到验证。根据基因组特征的组合模式,现有的和预测的转录协同作用可分为三类,为不同类型的TF协同作用提供了进一步的生物学见解。我们的方法是第一种用于预测转录协同作用的监督学习方法,优于其他无监督方法,并且可以应用于缺乏高质量金标准阳性数据的其他基因组数据整合任务。