使用 LASSO 对二元数据进行图形建模：一项模拟研究。

Graphical modeling of binary data using the LASSO: a simulation study.

机构信息

Institute for Medical Informatics, Biometrics and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany.

出版信息

BMC Med Res Methodol. 2012 Feb 21;12:16. doi: 10.1186/1471-2288-12-16.

DOI:10.1186/1471-2288-12-16

PMID:22353192

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3305667/

Abstract

BACKGROUND

Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Bühlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models.

METHODS

We analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index.We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer.

RESULTS

Bootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms.

CONCLUSIONS

Bootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes.

摘要

背景

图形模型被认为是一种很有前途的新方法，可以对高维临床数据进行建模。它们提供了一种概率工具，可以通过绘制描述变量之间条件依赖关系的图形来显示、分析和可视化网状依赖结构。到目前为止，研究的主要重点是为遵循多元正态分布的连续多元数据构建高斯图形模型。对于二进制数据，还没有令人满意的解决方案。我们将 Meinshausen 和 Bühlmann 的方法改编为二进制数据，并使用 LASSO 进行逻辑回归。本文的目的是检验 Bolasso 在开发高维二进制数据图形模型方面的性能。我们假设 Bolasso 的性能优于竞争的 LASSO 方法，以识别图形模型。

方法

我们分析了 Bolasso，以与其他基于 LASSO 的方法相比，推导出图形模型。通过使用对称局部逻辑回归模型和 Gibbs 抽样生成的随机数据，在模拟研究中评估模型性能。主要的结果变量是结构汉明距离和 Youden 指数。我们将模拟研究的结果应用于头颈部癌症患者功能数据的实际生活数据。

结果

Bootstrap aggregating 作为 Bolasso 算法的一部分，大大提高了在较大样本量下的性能。Bootstraps 的数量对性能的影响最小。Bolasso 在截断值为 0.90 和较小的惩罚项时表现相当不错。与 AIC、BIC 或交叉验证最优惩罚项相比，Bolasso 的最优预测导致非常保守的模型。

结论

如果底层选择过程不会因样本量小而变得非常不稳定，并且如果主要关注降低假发现率，则 Bootstrap aggregating 可以改善变量选择。我们建议在大样本量中使用 Bolasso 进行图形建模。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cc4/3305667/c150ccd78813/1471-2288-12-16-1.jpg

相似文献

Graphical modeling of binary data using the LASSO: a simulation study.使用 LASSO 对二元数据进行图形建模：一项模拟研究。

BMC Med Res Methodol. 2012 Feb 21;12:16. doi: 10.1186/1471-2288-12-16.

Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models.套索惩罚的经验性扩展以降低高维Cox回归模型中的错误发现率

Stat Med. 2016 Jul 10;35(15):2561-73. doi: 10.1002/sim.6927. Epub 2016 Mar 10.

Learning mixed graphical models with separate sparsity parameters and stability-based model selection.学习具有单独稀疏参数和基于稳定性的模型选择的混合图形模型。

BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):175. doi: 10.1186/s12859-016-1039-0.

Weighted lasso in graphical Gaussian modeling for large gene network estimation based on microarray data.基于微阵列数据的大型基因网络估计的图形高斯建模中的加权套索法

Genome Inform. 2007;19:142-53.

Mixed Bayesian networks: a mixture of Gaussian distributions.混合贝叶斯网络：高斯分布的混合。

Methods Inf Med. 1994 Dec;33(5):535-42.

Joint Bayesian variable and graph selection for regression models with network-structured predictors.具有网络结构预测变量的回归模型的联合贝叶斯变量与图选择

Stat Med. 2016 Mar 30;35(7):1017-31. doi: 10.1002/sim.6792. Epub 2015 Oct 29.

Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents.使用LASSO型惩罚的改进变量选择算法及其在评估社区居民乙肝感染相关因素中的应用

PLoS One. 2015 Jul 27;10(7):e0134151. doi: 10.1371/journal.pone.0134151. eCollection 2015.

On Penalty Parameter Selection for Estimating Network Models.关于网络模型估计中惩罚参数选择的研究。

Multivariate Behav Res. 2021 Mar-Apr;56(2):288-302. doi: 10.1080/00273171.2019.1672516. Epub 2019 Nov 1.

Class-imbalanced subsampling lasso algorithm for discovering adverse drug reactions.用于发现药物不良反应的类不平衡子采样套索算法

Stat Methods Med Res. 2018 Mar;27(3):785-797. doi: 10.1177/0962280216643116. Epub 2016 Apr 25.

A comparison of model selection methods for prediction in the presence of multiply imputed data.存在多重填补数据时预测的模型选择方法比较

Biom J. 2019 Mar;61(2):343-356. doi: 10.1002/bimj.201700232. Epub 2018 Oct 23.

引用本文的文献

An exploratory network analysis to investigate schizotypy's structure using the 'Multidimensional Schizotypy Scale' and 'Oxford-Liverpool Inventory' in a healthy cohort.一项探索性网络分析，旨在使用“多维分裂型特质量表”和“牛津-利物浦库存量表”对一个健康队列中分裂型特质的结构进行研究。

Schizophrenia (Heidelb). 2025 Feb 28;11(1):34. doi: 10.1038/s41537-025-00584-3.

Comorbidity network analysis using graphical models for electronic health records.使用图形模型对电子健康记录进行共病网络分析。

Front Big Data. 2023 Aug 17;6:846202. doi: 10.3389/fdata.2023.846202. eCollection 2023.

Modelling of the ICF core sets for chronic ischemic heart disease using the LASSO model in Chinese patients.应用 LASSO 模型对中国慢性缺血性心脏病国际功能、残疾和健康分类核心组合进行建模。

Health Qual Life Outcomes. 2018 Jul 11;16(1):139. doi: 10.1186/s12955-018-0957-0.

本文引用的文献

Graphical modeling can be used to illustrate associations between variables describing functioning in head and neck cancer patients.图形建模可用于说明描述头颈部癌症患者功能的变量之间的关联。

J Clin Epidemiol. 2011 Aug;64(8):885-92. doi: 10.1016/j.jclinepi.2010.11.010. Epub 2011 Feb 12.

Understanding human functioning using graphical models.使用图形模型理解人类功能。

BMC Med Res Methodol. 2010 Feb 11;10:14. doi: 10.1186/1471-2288-10-14.

L1 penalized estimation in the Cox proportional hazards model.Cox比例风险模型中的L1惩罚估计

Biom J. 2010 Feb;52(1):70-84. doi: 10.1002/bimj.200900028.

Graphical models illustrated complex associations between variables describing human functioning.图形模型展示了描述人类机能的变量之间的复杂关联。

J Clin Epidemiol. 2009 Sep;62(9):922-33. doi: 10.1016/j.jclinepi.2009.01.018. Epub 2009 Jun 21.

Methodological considerations, such as directed acyclic graphs, for studying "acute on chronic" disease epidemiology: chronic obstructive pulmonary disease example.研究“慢性基础上急性发作”疾病流行病学的方法学考量，如实向无环图：以慢性阻塞性肺疾病为例

J Clin Epidemiol. 2009 Sep;62(9):982-90. doi: 10.1016/j.jclinepi.2008.10.005. Epub 2009 Feb 10.

Sparse inverse covariance estimation with the graphical lasso.使用图模型选择法进行稀疏逆协方差估计。

Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.

Instruments for causal inference: an epidemiologist's dream?因果推断的工具：流行病学家的梦想？

Epidemiology. 2006 Jul;17(4):360-72. doi: 10.1097/01.ede.0000222409.00878.37.

Low-order conditional independence graphs for inferring genetic networks.用于推断遗传网络的低阶条件独立图。

Stat Appl Genet Mol Biol. 2006;5:Article1. doi: 10.2202/1544-6115.1170. Epub 2006 Jan 4.

Sparse multinomial logistic regression: fast algorithms and generalization bounds.稀疏多项逻辑回归：快速算法与泛化界

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):957-68. doi: 10.1109/TPAMI.2005.127.

A simple and efficient algorithm for gene selection using sparse logistic regression.一种使用稀疏逻辑回归进行基因选择的简单高效算法。

Bioinformatics. 2003 Nov 22;19(17):2246-53. doi: 10.1093/bioinformatics/btg308.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用 LASSO 对二元数据进行图形建模：一项模拟研究。

Graphical modeling of binary data using the LASSO: a simulation study.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献