Suppr超能文献

一种用于高维回归中变量选择的简单信息准则。

A Simple Information Criterion for Variable Selection in High-Dimensional Regression.

作者信息

Pluntz Matthieu, Dalmasso Cyril, Tubert-Bitter Pascale, Ahmed Ismaïl

机构信息

High-Dimensional Biostatistics for Drug Safety and Genomics, CESP, Université Paris-Saclay, UVSQ, Université Paris-Sud, Inserm, Villejuif, France.

Laboratoire de Mathématiques et Modélisation d'Évry (LaMME), Université d'Evry Val d'Essonne, Évry, France.

出版信息

Stat Med. 2025 Jan 15;44(1-2):e10275. doi: 10.1002/sim.10275. Epub 2024 Dec 12.

Abstract

High-dimensional regression problems, for example with genomic or drug exposure data, typically involve automated selection of a sparse set of regressors. Penalized regression methods like the LASSO can deliver a family of candidate sparse models. To select one, there are criteria balancing log-likelihood and model size, the most common being AIC and BIC. These two methods do not take into account the implicit multiple testing performed when selecting variables in a high-dimensional regression, which makes them too liberal. We propose the extended AIC (EAIC), a new information criterion for sparse model selection in high-dimensional regressions. It allows for asymptotic FWER control when the candidate regressors are independent. It is based on a simple formula involving model log-likelihood, model size, the total number of candidate regressors, and the FWER target. In a simulation study over a wide range of linear and logistic regression settings, we assessed the variable selection performance of the EAIC and of other information criteria (including some that also use the number of candidate regressors: mBIC, mAIC, and EBIC) in conjunction with the LASSO. Our method controls the FWER in nearly all settings, in contrast to the AIC and BIC, which produce many false positives. We also illustrate it for the automated signal detection of adverse drug reactions on the French pharmacovigilance spontaneous reporting database.

摘要

高维回归问题,例如涉及基因组或药物暴露数据的问题,通常需要自动选择一组稀疏的回归变量。像套索(LASSO)这样的惩罚回归方法可以提供一系列候选稀疏模型。为了选择其中一个,有一些标准来平衡对数似然和模型大小,最常见的是AIC和BIC。这两种方法没有考虑在高维回归中选择变量时进行的隐式多重检验,这使得它们过于宽松。我们提出了扩展AIC(EAIC),这是一种用于高维回归中稀疏模型选择的新信息准则。当候选回归变量独立时,它允许进行渐近的错误发现率(FWER)控制。它基于一个简单的公式,该公式涉及模型对数似然、模型大小、候选回归变量的总数以及FWER目标。在一个涵盖广泛线性和逻辑回归设置的模拟研究中,我们结合套索(LASSO)评估了EAIC和其他信息准则(包括一些也使用候选回归变量数量的准则:mBIC、mAIC和EBIC)的变量选择性能。与产生许多假阳性的AIC和BIC不同,我们的方法在几乎所有设置中都能控制FWER。我们还在法国药物警戒自发报告数据库上展示了它在药物不良反应自动信号检测中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d6d/11702156/5865ba276d43/SIM-44-0-g003.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验