Suppr超能文献

使用 LASSO 对二元数据进行图形建模:一项模拟研究。

Graphical modeling of binary data using the LASSO: a simulation study.

机构信息

Institute for Medical Informatics, Biometrics and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany.

出版信息

BMC Med Res Methodol. 2012 Feb 21;12:16. doi: 10.1186/1471-2288-12-16.

Abstract

BACKGROUND

Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Bühlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models.

METHODS

We analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index.We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer.

RESULTS

Bootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms.

CONCLUSIONS

Bootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes.

摘要

背景

图形模型被认为是一种很有前途的新方法,可以对高维临床数据进行建模。它们提供了一种概率工具,可以通过绘制描述变量之间条件依赖关系的图形来显示、分析和可视化网状依赖结构。到目前为止,研究的主要重点是为遵循多元正态分布的连续多元数据构建高斯图形模型。对于二进制数据,还没有令人满意的解决方案。我们将 Meinshausen 和 Bühlmann 的方法改编为二进制数据,并使用 LASSO 进行逻辑回归。本文的目的是检验 Bolasso 在开发高维二进制数据图形模型方面的性能。我们假设 Bolasso 的性能优于竞争的 LASSO 方法,以识别图形模型。

方法

我们分析了 Bolasso,以与其他基于 LASSO 的方法相比,推导出图形模型。通过使用对称局部逻辑回归模型和 Gibbs 抽样生成的随机数据,在模拟研究中评估模型性能。主要的结果变量是结构汉明距离和 Youden 指数。我们将模拟研究的结果应用于头颈部癌症患者功能数据的实际生活数据。

结果

Bootstrap aggregating 作为 Bolasso 算法的一部分,大大提高了在较大样本量下的性能。Bootstraps 的数量对性能的影响最小。Bolasso 在截断值为 0.90 和较小的惩罚项时表现相当不错。与 AIC、BIC 或交叉验证最优惩罚项相比,Bolasso 的最优预测导致非常保守的模型。

结论

如果底层选择过程不会因样本量小而变得非常不稳定,并且如果主要关注降低假发现率,则 Bootstrap aggregating 可以改善变量选择。我们建议在大样本量中使用 Bolasso 进行图形建模。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cc4/3305667/c150ccd78813/1471-2288-12-16-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验