使用最小 AIC 或 BIC 森林选择高维混合图形模型。

Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests.

机构信息

Institute of Genetics and Biotechnology, Faculty of Agricultural Sciences, Aarhus University, Aarhus, Denmark.

出版信息

BMC Bioinformatics. 2010 Jan 11;11:18. doi: 10.1186/1471-2105-11-18.

DOI:10.1186/1471-2105-11-18

PMID:20064242

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2823705/

Abstract

BACKGROUND

Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems.

RESULTS

We extend Chow and Liu's approach in two ways: first, to find the forest optimizing a penalized likelihood criterion, for example AIC or BIC, and second, to handle data with both discrete and Gaussian variables. We apply the approach to three datasets: two from gene expression studies and the third from a genetics of gene expression study. The minimal BIC forest supplements a conventional analysis of differential expression by providing a tentative network for the differentially expressed genes. In the genetics of gene expression context the method identifies a network approximating the joint distribution of the DNA markers and the gene expression levels.

CONCLUSIONS

The approach is generally useful as a preliminary step towards understanding the overall dependence structure of high-dimensional discrete and/or continuous data. Trees and forests are unrealistically simple models for biological systems, but can provide useful insights. Uses include the following: identification of distinct connected components, which can be analysed separately (dimension reduction); identification of neighbourhoods for more detailed analyses; as initial models for search algorithms with a larger search space, for example decomposable models or Bayesian networks; and identification of interesting features, such as hub nodes.

摘要

背景

Chow 和 Liu 表明，对于多元离散分布，可以使用最大权重生成树算法（例如 Kruskal 算法）找到最大似然树。该算法的效率使其适用于高维问题。

结果

我们以两种方式扩展了 Chow 和 Liu 的方法：首先，找到优化惩罚似然准则（例如 AIC 或 BIC）的森林，其次，处理同时具有离散和高斯变量的数据。我们将该方法应用于三个数据集：两个来自基因表达研究，第三个来自基因表达遗传学研究。最小 BIC 森林通过为差异表达基因提供一个暂定网络，补充了传统的差异表达分析。在基因表达遗传学背景下，该方法识别出一个近似于 DNA 标记和基因表达水平联合分布的网络。

结论

该方法通常可作为理解高维离散和/或连续数据整体依赖结构的初步步骤。树和森林对于生物系统来说是不切实际的简单模型，但可以提供有用的见解。用途包括以下几个方面：识别不同的连接组件，可以分别进行分析（降维）；识别更详细分析的邻域；作为更大搜索空间的搜索算法的初始模型，例如可分解模型或贝叶斯网络；以及识别有趣的特征，如枢纽节点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a14/2823705/40fa3c58e743/1471-2105-11-18-1.jpg

相似文献

Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests.

BMC Bioinformatics. 2010 Jan 11;11:18. doi: 10.1186/1471-2105-11-18.

A copula method for modeling directional dependence of genes.

BMC Bioinformatics. 2008 May 1;9:225. doi: 10.1186/1471-2105-9-225.

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO.

BMC Bioinformatics. 2017 Feb 10;18(1):99. doi: 10.1186/s12859-017-1515-1.

Inferring gene networks from discrete expression data.

Biostatistics. 2013 Sep;14(4):708-22. doi: 10.1093/biostatistics/kxt021. Epub 2013 Jul 18.

Information enhanced model selection for Gaussian graphical model with application to metabolomic data.

Biostatistics. 2022 Jul 18;23(3):926-948. doi: 10.1093/biostatistics/kxab006.

A GMM-IG framework for selecting genes as expression panel biomarkers.

Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.

Weighted lasso in graphical Gaussian modeling for large gene network estimation based on microarray data.

Genome Inform. 2007;19:142-53.

A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.

BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.

A novel approach for clustering proteomics data using Bayesian fast Fourier transform.

Bioinformatics. 2005 May 15;21(10):2210-24. doi: 10.1093/bioinformatics/bti383. Epub 2005 Mar 15.

Estimation of sparse directed acyclic graphs for multivariate counts data.

Biometrics. 2016 Sep;72(3):791-803. doi: 10.1111/biom.12467. Epub 2016 Feb 5.

引用本文的文献

Associations Between Postoperative Symptom Clusters and Functional Status in Lung Cancer Patients: A Cross-Sectional Study.

Cancer Manag Res. 2025 Jun 12;17:1099-1111. doi: 10.2147/CMAR.S507420. eCollection 2025.

Spectral Clustering, Bayesian Spanning Forest, and Forest Process.

J Am Stat Assoc. 2024;119(547):2140-2153. doi: 10.1080/01621459.2023.2250098. Epub 2023 Sep 29.

Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian networks.

PLoS Comput Biol. 2023 Dec 1;19(12):e1011443. doi: 10.1371/journal.pcbi.1011443. eCollection 2023 Dec.

Balanced Functional Module Detection in genomic data.

Bioinform Adv. 2021 Sep 16;1(1):vbab018. doi: 10.1093/bioadv/vbab018. eCollection 2021.

Information enhanced model selection for Gaussian graphical model with application to metabolomic data.

Biostatistics. 2022 Jul 18;23(3):926-948. doi: 10.1093/biostatistics/kxab006.

Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: An Epidemiological Perspective.

Metabolites. 2019 Jun 18;9(6):117. doi: 10.3390/metabo9060117.

Brain Connectivity and Information-Flow Breakdown Revealed by a Minimum Spanning Tree-Based Analysis of MRI Data in Behavioral Variant Frontotemporal Dementia.

Front Neurosci. 2019 Mar 14;13:211. doi: 10.3389/fnins.2019.00211. eCollection 2019.

Sensitivity and specificity of information criteria.

Brief Bioinform. 2020 Mar 23;21(2):553-565. doi: 10.1093/bib/bbz016.

What Is the Influence of Morphological Knowledge in the Early Stages of Reading Acquisition Among Low SES Children? A Graphical Modeling Approach.

Front Psychol. 2018 Apr 19;9:547. doi: 10.3389/fpsyg.2018.00547. eCollection 2018.

Acquisition and persistence of strain-specific methicillin-resistant Staphylococcus aureus and their determinants in community nursing homes.

BMC Infect Dis. 2017 Dec 6;17(1):752. doi: 10.1186/s12879-017-2837-3.

本文引用的文献

Reverse engineering molecular regulatory networks from microarray data with qp-graphs.

J Comput Biol. 2009 Feb;16(2):213-27. doi: 10.1089/cmb.2008.08TT.

Genome-scale reconstruction of the Lrp regulatory network in Escherichia coli.

Proc Natl Acad Sci U S A. 2008 Dec 9;105(49):19462-7. doi: 10.1073/pnas.0807227105. Epub 2008 Dec 3.

A review on models and algorithms for motif discovery in protein-protein interaction networks.

Brief Funct Genomic Proteomic. 2008 Mar;7(2):147-56. doi: 10.1093/bfgp/eln015. Epub 2008 Apr 28.

Sparse inverse covariance estimation with the graphical lasso.

Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.

ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context.

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-7-S1-S7.

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.

Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13550-5. doi: 10.1073/pnas.0506230102. Epub 2005 Sep 2.

Graphical modeling of the joint distribution of alleles at associated loci.

Am J Hum Genet. 2004 Jun;74(6):1088-101. doi: 10.1086/421249. Epub 2004 Apr 26.

Inferring cellular networks using probabilistic graphical models.

Science. 2004 Feb 6;303(5659):799-805. doi: 10.1126/science.1094068.

Network motifs: simple building blocks of complex networks.

Science. 2002 Oct 25;298(5594):824-7. doi: 10.1126/science.298.5594.824.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用最小 AIC 或 BIC 森林选择高维混合图形模型。

Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献