Suppr超能文献

一种综合数据集成框架,用于利用来自异构人群的外部汇总级信息。

A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

Biometrics. 2023 Dec;79(4):3831-3845. doi: 10.1111/biom.13852. Epub 2023 Apr 4.

Abstract

There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology, where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population.

摘要

人们越来越需要灵活的通用框架,这些框架可以将个体层面的数据与外部汇总信息集成在一起,以提高统计推断能力。与风险预测模型相关的外部信息可能有多种形式,例如回归系数估计值或因变量的预测值。不同的外部模型可能使用不同的预测变量集,并且它们用于预测给定这些预测变量的因变量 Y 的算法可能已知也可能未知。每个外部模型所对应的基础人群可能彼此不同,也可能与内部研究人群不同。受前列腺癌风险预测问题的启发,该问题中的新生物标志物仅在内部研究中进行测量,本文提出了一种基于插补的方法,该方法的目标是在内部研究中拟合具有所有可用预测变量的目标回归模型,同时利用来自外部模型的汇总信息,这些外部模型可能仅使用了预测变量的子集。该方法允许协变量效应在外部人群中存在异质性。所提出的方法在每个外部人群中生成合成的因变量数据,使用堆叠多重插补来创建具有完整协变量信息的长数据集。通过加权回归对堆叠插补数据进行最终分析。这种灵活统一的方法可以提高内部研究中估计系数的统计效率,通过利用来自仅使用内部研究中完整预测变量子集的模型的部分信息来提高预测能力,并为与内部人群的协变量效应可能不同的外部人群提供统计推断。

相似文献

1
A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.
Biometrics. 2023 Dec;79(4):3831-3845. doi: 10.1111/biom.13852. Epub 2023 Apr 4.
2
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
3
Eliciting adverse effects data from participants in clinical trials.
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
5
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

引用本文的文献

本文引用的文献

2
COMMUTE: Communication-efficient transfer learning for multi-site risk prediction.
J Biomed Inform. 2023 Jan;137:104243. doi: 10.1016/j.jbi.2022.104243. Epub 2022 Nov 18.
3
Accounting for not-at-random missingness through imputation stacking.
Stat Med. 2021 Nov 30;40(27):6118-6132. doi: 10.1002/sim.9174. Epub 2021 Aug 29.
4
A meta-inference framework to integrate multiple external models into a current study.
Biostatistics. 2023 Apr 14;24(2):406-424. doi: 10.1093/biostatistics/kxab017.
5
Combining Multiple Observational Data Sources to Estimate Causal Effects.
J Am Stat Assoc. 2020;115(531):1540-1554. doi: 10.1080/01621459.2019.1609973. Epub 2019 Jun 11.
6
A stacked approach for chained equations multiple imputation incorporating the substantive model.
Biometrics. 2021 Dec;77(4):1342-1354. doi: 10.1111/biom.13372. Epub 2020 Oct 5.
7
Combining primary cohort data with external aggregate information without assuming comparability.
Biometrics. 2021 Sep;77(3):1024-1036. doi: 10.1111/biom.13356. Epub 2020 Aug 25.
8
Synthetic data method to incorporate external information into a current study.
Can J Stat. 2019 Dec;47(4):580-603. doi: 10.1002/cjs.11513. Epub 2019 Jun 26.
9
Generalized meta-analysis for multiple regression models across studies with disparate covariate information.
Biometrika. 2019 Sep;106(3):567-585. doi: 10.1093/biomet/asz030. Epub 2019 Jul 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验