通过融合来自不同分布的数据进行基因网络推断。

Gene network inference by fusing data from diverse distributions.

作者信息

Žitnik Marinka, Zupan Blaž

机构信息

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

出版信息

Bioinformatics. 2015 Jun 15;31(12):i230-9. doi: 10.1093/bioinformatics/btv258.

DOI:10.1093/bioinformatics/btv258

PMID:26072487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4542780/

Abstract

MOTIVATION

Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets.

RESULTS

We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies.

AVAILABILITY AND IMPLEMENTATION

Source code is at https://github.com/marinkaz/fusenet.

摘要

动机

马尔可夫网络是无向图模型，广泛用于从实验数据中推断基因之间的关系。其最先进的推断程序假设数据来自高斯分布。高通量组学数据，如下一代测序产生的数据，常常违反这一假设。此外，当收集的数据来自多个相关但不完全相同的分布时，其潜在网络可能具有共同特征。需要新的有原则的统计方法，能够处理不同的数据分布并联合考虑数据集的集合。

结果

我们提出了FuseNet，一种从非相同分布的数据集集合中推断网络的马尔可夫网络公式。我们的方法计算效率高且具有通用性：给定指数族中的任意数量的分布，FuseNet通过定义网络节点邻域的共享潜在因子来表示模型参数。在一项模拟研究中，与几种流行的图形模型相比，我们证明了FuseNet具有良好的预测性能。我们展示了它在乳腺癌RNA测序和体细胞突变数据应用中的有效性，这是图形模型的一种新应用。相对于为每个数据集推断单独的网络，数据集的融合带来了显著的收益。我们的结果表明，用于非高斯数据的网络推断方法有助于准确建模新兴高通量技术生成的数据。

可用性和实现

源代码位于https://github.com/marinkaz/fusenet。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f066/4542780/afef4e6a78e3/btv258f1p.jpg

相似文献

Gene network inference by fusing data from diverse distributions.

Bioinformatics. 2015 Jun 15;31(12):i230-9. doi: 10.1093/bioinformatics/btv258.

A Local Poisson Graphical Model for inferring networks from sequencing data.

IEEE Trans Nanobioscience. 2013 Sep;12(3):189-98. doi: 10.1109/TNB.2013.2263838. Epub 2013 Aug 15.

A Multiattribute Gaussian Graphical Model for Inferring Multiscale Regulatory Networks: An Application in Breast Cancer.

Methods Mol Biol. 2019;1883:143-160. doi: 10.1007/978-1-4939-8882-2_6.

XMRF: an R package to fit Markov Networks to high-throughput genetics data.

BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):69. doi: 10.1186/s12918-016-0313-0.

Inferring gene networks from discrete expression data.

Biostatistics. 2013 Sep;14(4):708-22. doi: 10.1093/biostatistics/kxt021. Epub 2013 Jul 18.

Incorporating prior information into differential network analysis using non-paranormal graphical models.

Bioinformatics. 2017 Aug 15;33(16):2436-2445. doi: 10.1093/bioinformatics/btx208.

Network analysis for count data with excess zeros.

BMC Genet. 2017 Nov 6;18(1):93. doi: 10.1186/s12863-017-0561-z.

A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data.

Bioinformatics. 2018 Mar 15;34(6):964-970. doi: 10.1093/bioinformatics/btx605.

Node-based learning of differential networks from multi-platform gene expression data.

Methods. 2017 Oct 1;129:41-49. doi: 10.1016/j.ymeth.2017.05.014. Epub 2017 Jun 1.

Inferring gene expression networks with hubs using a degree weighted Lasso approach.

Bioinformatics. 2019 Mar 15;35(6):987-994. doi: 10.1093/bioinformatics/bty716.

引用本文的文献

SAILoR: Structure-Aware Inference of Logic Rules.

PLoS One. 2024 Jun 11;19(6):e0304102. doi: 10.1371/journal.pone.0304102. eCollection 2024.

Integration of Meta-Multi-Omics Data Using Probabilistic Graphs and External Knowledge.

Cells. 2023 Aug 4;12(15):1998. doi: 10.3390/cells12151998.

Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling.

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i504-i512. doi: 10.1093/bioinformatics/btad247.

A guide to multi-omics data collection and integration for translational medicine.

Comput Struct Biotechnol J. 2022 Dec 1;21:134-149. doi: 10.1016/j.csbj.2022.11.050. eCollection 2023.

Review and assessment of Boolean approaches for inference of gene regulatory networks.

Heliyon. 2022 Aug 9;8(8):e10222. doi: 10.1016/j.heliyon.2022.e10222. eCollection 2022 Aug.

Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review.

Front Genet. 2022 Mar 22;13:854752. doi: 10.3389/fgene.2022.854752. eCollection 2022.

A novel probabilistic generator for large-scale gene association networks.

PLoS One. 2021 Nov 12;16(11):e0259193. doi: 10.1371/journal.pone.0259193. eCollection 2021.

Where Do We Stand in Regularization for Life Science Studies?

J Comput Biol. 2022 Mar;29(3):213-232. doi: 10.1089/cmb.2019.0371. Epub 2021 Apr 29.

Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools.

Front Oncol. 2020 Jun 30;10:1030. doi: 10.3389/fonc.2020.01030. eCollection 2020.

Review of Causal Discovery Methods Based on Graphical Models.

Front Genet. 2019 Jun 4;10:524. doi: 10.3389/fgene.2019.00524. eCollection 2019.

本文引用的文献

Data Fusion by Matrix Factorization.

IEEE Trans Pattern Anal Mach Intell. 2015 Jan;37(1):41-53. doi: 10.1109/TPAMI.2014.2343973.

A proteome-scale map of the human interactome network.

Cell. 2014 Nov 20;159(5):1212-1226. doi: 10.1016/j.cell.2014.10.050.

Widespread genetic epistasis among cancer genes.

Nat Commun. 2014 Nov 19;5:4828. doi: 10.1038/ncomms5828.

SANTA: quantifying the functional content of molecular networks.

PLoS Comput Biol. 2014 Sep 11;10(9):e1003808. doi: 10.1371/journal.pcbi.1003808. eCollection 2014 Sep.

Discovering disease-disease associations by fusing systems-level molecular data.

Sci Rep. 2013 Nov 15;3:3202. doi: 10.1038/srep03202.

Bayesian network prior: network analysis of biological data using external knowledge.

Bioinformatics. 2014 Mar 15;30(6):860-7. doi: 10.1093/bioinformatics/btt643. Epub 2013 Nov 9.

A hierarchical poisson log-normal model for network inference from RNA sequencing data.

PLoS One. 2013 Oct 17;8(10):e77503. doi: 10.1371/journal.pone.0077503. eCollection 2013.

Bayesian Gaussian Copula Factor Models for Mixed Data.

J Am Stat Assoc. 2013 Jun 1;108(502):656-665. doi: 10.1080/01621459.2012.762328.

A Local Poisson Graphical Model for inferring networks from sequencing data.

IEEE Trans Nanobioscience. 2013 Sep;12(3):189-98. doi: 10.1109/TNB.2013.2263838. Epub 2013 Aug 15.

GATA3 acts upstream of FOXA1 in mediating ESR1 binding by shaping enhancer accessibility.

Genome Res. 2013 Jan;23(1):12-22. doi: 10.1101/gr.139469.112. Epub 2012 Nov 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

通过融合来自不同分布的数据进行基因网络推断。

Gene network inference by fusing data from diverse distributions.

作者信息

机构信息