Suppr超能文献

考虑多个数据源回归系数异质性的高维变量选择

High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources.

作者信息

Yu Tingting, Ye Shangyuan, Wang Rui

机构信息

Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, U.S.A.

Biostatistics Shared Resource, Knight Cancer Institute, Oregon Health and Science University Portland, Oregon, U.S.A.

出版信息

Can J Stat. 2024 Sep;52(3):900-923. doi: 10.1002/cjs.11793. Epub 2023 Aug 19.

Abstract

When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this paper, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with sub-homogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO, and a multi-directional shrinkage penalty method). Finally, we apply the proposed method to the multi-center Childhood Adenotonsillectomy Trial to identify sub-homogeneity in the treatment effects across different study sites.

摘要

在分析来自多个来源(如医院、研究)的组合数据时,必须考虑不同来源之间的异质性。在本文中,我们考虑用于综合数据分析的高维线性回归模型。我们提出了一种新的自适应聚类惩罚(ACP)方法,以同时选择变量并对具有子同质性的特定来源回归系数进行聚类。我们表明,在某些正则条件下,基于ACP方法的估计器具有很强的神谕性质。我们还基于乘子交替方向法(ADMM)开发了一种用于参数估计的高效算法。我们进行模拟研究,将所提出方法的性能与三种现有方法(具有相邻融合的融合LASSO、成对融合LASSO和多方向收缩惩罚方法)进行比较。最后,我们将所提出的方法应用于多中心儿童腺样体扁桃体切除术试验,以识别不同研究地点治疗效果中的子同质性。

相似文献

2
Audit and feedback: effects on professional practice.
Cochrane Database Syst Rev. 2025 Mar 25;3(3):CD000259. doi: 10.1002/14651858.CD000259.pub4.
3
Quantile regression shrinkage and selection via the Lqsso.
J Biopharm Stat. 2024 May;34(3):297-322. doi: 10.1080/10543406.2023.2198593. Epub 2023 Apr 9.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.
Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.
8
Sequential versus standard triple first-line therapy for Helicobacter pylori eradication.
Cochrane Database Syst Rev. 2016 Jun 28;2016(6):CD009034. doi: 10.1002/14651858.CD009034.pub2.
9
RCFGL: Rapid Condition adaptive Fused Graphical Lasso and application to modeling brain region co-expression networks.
PLoS Comput Biol. 2023 Jan 6;19(1):e1010758. doi: 10.1371/journal.pcbi.1010758. eCollection 2023 Jan.
10
Artificial intelligence for diagnosing exudative age-related macular degeneration.
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

引用本文的文献

1
Variable selection in modelling clustered data via within-cluster resampling.
Can J Stat. 2025 Mar;53(1). doi: 10.1002/cjs.11824. Epub 2024 Aug 1.

本文引用的文献

1
Capturing heterogeneity in repeated measures data by fusion penalty.
Stat Med. 2021 Apr 15;40(8):1901-1916. doi: 10.1002/sim.8878. Epub 2021 Jan 31.
2
A flexible nonlinear mixed effects model for HIV viral load rebound after treatment interruption.
Stat Med. 2020 Jul 10;39(15):2051-2066. doi: 10.1002/sim.8529. Epub 2020 Apr 15.
5
Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements.
Biometrics. 2016 Dec;72(4):1184-1193. doi: 10.1111/biom.12496. Epub 2016 Feb 22.
6
Homogeneity Pursuit.
J Am Stat Assoc. 2015;110(509):175-194. doi: 10.1080/01621459.2014.892882.
7
A randomized trial of adenotonsillectomy for childhood sleep apnea.
N Engl J Med. 2013 Jun 20;368(25):2366-76. doi: 10.1056/NEJMoa1215881. Epub 2013 May 21.
8
Penalized generalized estimating equations for high-dimensional longitudinal data analysis.
Biometrics. 2012 Jun;68(2):353-60. doi: 10.1111/j.1541-0420.2011.01678.x. Epub 2011 Sep 28.
9
Modeling site effects in the design and analysis of multi-site trials.
Am J Drug Alcohol Abuse. 2011 Sep;37(5):383-91. doi: 10.3109/00952990.2011.600386.
10
Grouping pursuit through a regularization solution surface.
J Am Stat Assoc. 2010 Jun 1;105(490):727-739. doi: 10.1198/jasa.2010.tm09380.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验