基于贝叶斯数据融合的方法学习全基因组转录调控网络。

A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

机构信息

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy.

Center for Biomedical Informatics and Biostatistics, Dept. of Medicine, The University of Arizona Health Sciences, 1230 Cherry Ave, Tucson, AZ, 85719, USA.

出版信息

BMC Bioinformatics. 2020 May 29;21(1):219. doi: 10.1186/s12859-020-3510-1.

DOI:10.1186/s12859-020-3510-1

PMID:32471360

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7257163/

Abstract

BACKGROUND

Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number of interacting variables and the noise in the available heterogeneous experimental sources of information.

RESULTS

In this work, we propose a data fusion approach that exploits the integration of complementary omics-data as prior knowledge within a Bayesian framework, in order to learn and model large-scale transcriptional networks. We develop a hybrid structure-learning algorithm able to jointly combine TFs ChIP-Sequencing data and gene expression compendia to reconstruct TRNs in a genome-wide perspective. Applying our method to high-throughput data, we verified its ability to deal with the complexity of a genomic TRN, providing a snapshot of the synergistic TFs regulatory activity. Given the noisy nature of data-driven prior knowledge, which potentially contains incorrect information, we also tested the method's robustness to false priors on a benchmark dataset, comparing the proposed approach to other regulatory network reconstruction algorithms. We demonstrated the effectiveness of our framework by evaluating structural commonalities of our learned genomic network with other existing networks inferred by different DNA binding information-based methods.

CONCLUSIONS

This Bayesian omics-data fusion based methodology allows to gain a genome-wide picture of the transcriptional interplay, helping to unravel key hierarchical transcriptional interactions, which could be subsequently investigated, and it represents a promising learning approach suitable for multi-layered genomic data integration, given its robustness to noisy sources and its tailored framework for handling high dimensional data.

摘要

背景

从基因组学数据中反向工程转录调控网络（TRN）一直是系统生物学中的一个计算挑战。主要问题是对转录因子（TFs）及其靶基因之间的复杂串扰进行建模，需要有一种方法能够处理相互作用变量的数量和可用异质实验信息源中的噪声。

结果

在这项工作中，我们提出了一种数据融合方法，该方法利用将互补的组学数据作为贝叶斯框架内的先验知识进行集成，以便学习和建模大规模转录网络。我们开发了一种混合结构学习算法，能够联合结合 TF ChIP-Seq 数据和基因表达汇编，从全基因组角度重建 TRN。将我们的方法应用于高通量数据，我们验证了它处理基因组 TRN 复杂性的能力，提供了协同 TF 调节活性的快照。鉴于数据驱动的先验知识的噪声性质，其中可能包含不正确的信息，我们还在基准数据集上测试了该方法对虚假先验的鲁棒性，将所提出的方法与其他监管网络重建算法进行了比较。我们通过评估我们学习的基因组网络与其他基于不同 DNA 结合信息的方法推断的现有网络的结构共性，证明了我们框架的有效性。