• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过聚类内重采样对聚类数据进行建模时的变量选择。

Variable selection in modelling clustered data via within-cluster resampling.

作者信息

Ye Shangyuan, Yu Tingting, Caroff Daniel A, Huang Susan S, Zhang Bo, Wang Rui

机构信息

Biostatistics Shared Resource, Knight Cancer Institute, Oregon Health & Science University, Oregon, U.S.A.

Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Massachusetts, U.S.A.

出版信息

Can J Stat. 2025 Mar;53(1). doi: 10.1002/cjs.11824. Epub 2024 Aug 1.

DOI:10.1002/cjs.11824
PMID:40040799
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11878247/
Abstract

In many biomedical applications, there is a need to build risk-adjustment models based on clustered data. However, methods for variable selection that are applicable to clustered discrete data settings with a large number of candidate variables and potentially large cluster sizes are lacking. We develop a new variable selection approach that combines within-cluster resampling techniques with penalized likelihood methods to select variables for high-dimensional clustered data. We derive an upper bound on the expected number of falsely selected variables, demonstrate the oracle properties of the proposed method, and evaluate the finite sample performance of the method through extensive simulations. We illustrate the proposed approach using a colon surgical site infection data set consisting of 39,468 individuals from 149 hospitals to build risk-adjustment models that account for both the main effects of various risk factors and their two-way interactions.

摘要

在许多生物医学应用中,需要基于聚类数据构建风险调整模型。然而,适用于具有大量候选变量和潜在大聚类规模的聚类离散数据设置的变量选择方法却很缺乏。我们开发了一种新的变量选择方法,该方法将聚类内重采样技术与惩罚似然方法相结合,用于为高维聚类数据选择变量。我们推导了错误选择变量预期数量的上界,证明了所提方法的神谕性质,并通过广泛的模拟评估了该方法的有限样本性能。我们使用一个包含来自149家医院的39468名个体的结肠手术部位感染数据集来说明所提方法,以构建考虑各种风险因素的主效应及其双向交互作用的风险调整模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580a/11878247/687b879cb3bd/nihms-1997645-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580a/11878247/1a990c26b72b/nihms-1997645-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580a/11878247/687b879cb3bd/nihms-1997645-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580a/11878247/1a990c26b72b/nihms-1997645-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580a/11878247/687b879cb3bd/nihms-1997645-f0002.jpg

相似文献

1
Variable selection in modelling clustered data via within-cluster resampling.通过聚类内重采样对聚类数据进行建模时的变量选择。
Can J Stat. 2025 Mar;53(1). doi: 10.1002/cjs.11824. Epub 2024 Aug 1.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
A model selection criterion for clustered survival analysis with informative cluster size.具有信息性聚类大小的聚类生存分析的模型选择标准。
Pharm Stat. 2023 Jan;22(1):79-95. doi: 10.1002/pst.2261. Epub 2022 Aug 23.
4
Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes.用于识别具有重复测量二元结局的数据中预测变量交互作用的变量选择方法。
J Clin Transl Sci. 2020 Nov 16;5(1):e59. doi: 10.1017/cts.2020.556.
5
Variable selection in the presence of missing data: resampling and imputation.存在缺失数据时的变量选择:重采样与插补
Biostatistics. 2015 Jul;16(3):596-610. doi: 10.1093/biostatistics/kxv003. Epub 2015 Feb 18.
6
A semiparametric joint model for cluster size and subunit-specific interval-censored outcomes.一种用于群组大小和亚基特定区间截断结局的半参数联合模型。
Biometrics. 2023 Sep;79(3):2010-2022. doi: 10.1111/biom.13795. Epub 2022 Dec 15.
7
VARIABLE SELECTION FOR HIGH DIMENSIONAL MULTIVARIATE OUTCOMES.高维多元结果的变量选择
Stat Sin. 2014 Oct;24(4):1633-1654. doi: 10.5705/ss.2013.019.
8
Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications.通过绝对惩罚凸最小化进行估计与选择及其多阶段自适应应用
J Mach Learn Res. 2012 Jun 1;13:1839-1864.
9
Variable Selection in Semiparametric Regression Modeling.半参数回归建模中的变量选择
Ann Stat. 2008;36(1):261-286. doi: 10.1214/009053607000000604.
10
Penalized variable selection for cause-specific hazard frailty models with clustered competing-risks data.基于聚类竞争风险数据的有向风险脆弱性模型的惩罚变量选择。
Stat Med. 2021 Dec 20;40(29):6541-6557. doi: 10.1002/sim.9197. Epub 2021 Sep 20.

本文引用的文献

1
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources.考虑多个数据源回归系数异质性的高维变量选择
Can J Stat. 2024 Sep;52(3):900-923. doi: 10.1002/cjs.11793. Epub 2023 Aug 19.
2
The impact of surgical volume on hospital ranking using the standardized infection ratio.利用标准化感染比评估手术量对医院排名的影响。
Sci Rep. 2023 May 10;13(1):7624. doi: 10.1038/s41598-023-33937-y.
3
TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA.高维纵向数据的显著性检验
Ann Stat. 2020 Oct;48(5):2622-2645. doi: 10.1214/19-aos1900. Epub 2020 Sep 19.
4
A stochastic second-order generalized estimating equations approach for estimating association parameters.一种用于估计关联参数的随机二阶广义估计方程方法。
J Comput Graph Stat. 2020;29(3):547-561. doi: 10.1080/10618600.2019.1710156. Epub 2020 Feb 7.
5
FEATURE SELECTION FOR GENERALIZED VARYING COEFFICIENT MIXED-EFFECT MODELS WITH APPLICATION TO OBESITY GWAS.广义变系数混合效应模型的特征选择及其在肥胖全基因组关联研究中的应用
Ann Appl Stat. 2020 Mar;14(1):276-298. doi: 10.1214/19-aoas1310. Epub 2020 Apr 16.
6
The Limited Utility of Ranking Hospitals Based on Their Colon Surgery Infection Rates.基于结肠手术感染率对医院进行排名的效用有限。
Clin Infect Dis. 2021 Jan 23;72(1):90-98. doi: 10.1093/cid/ciaa012.
7
Post-Selection Inference for -Penalized Likelihood Models.用于惩罚似然模型的选择后推断
Can J Stat. 2018 Mar;46(1):41-61. doi: 10.1002/cjs.11313. Epub 2017 Mar 6.
8
FEATURE SCREENING FOR TIME-VARYING COEFFICIENT MODELS WITH ULTRAHIGH DIMENSIONAL LONGITUDINAL DATA.超高维纵向数据的时变系数模型的特征筛选
Ann Appl Stat. 2016 Jun;10(2):596-617. doi: 10.1214/16-AOAS912. Epub 2016 Jul 22.
9
Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements.在合并具有重复测量的多项研究中采用参数排序适配的融合套索法。
Biometrics. 2016 Dec;72(4):1184-1193. doi: 10.1111/biom.12496. Epub 2016 Feb 22.
10
A LASSO FOR HIERARCHICAL INTERACTIONS.用于分层交互的套索法
Ann Stat. 2013 Jun;41(3):1111-1141. doi: 10.1214/13-AOS1096.