• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种综合数据集成框架,用于利用来自异构人群的外部汇总级信息。

A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

Biometrics. 2023 Dec;79(4):3831-3845. doi: 10.1111/biom.13852. Epub 2023 Apr 4.

DOI:10.1111/biom.13852
PMID:36876883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10480346/
Abstract

There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology, where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population.

摘要

人们越来越需要灵活的通用框架,这些框架可以将个体层面的数据与外部汇总信息集成在一起,以提高统计推断能力。与风险预测模型相关的外部信息可能有多种形式,例如回归系数估计值或因变量的预测值。不同的外部模型可能使用不同的预测变量集,并且它们用于预测给定这些预测变量的因变量 Y 的算法可能已知也可能未知。每个外部模型所对应的基础人群可能彼此不同,也可能与内部研究人群不同。受前列腺癌风险预测问题的启发,该问题中的新生物标志物仅在内部研究中进行测量,本文提出了一种基于插补的方法,该方法的目标是在内部研究中拟合具有所有可用预测变量的目标回归模型,同时利用来自外部模型的汇总信息,这些外部模型可能仅使用了预测变量的子集。该方法允许协变量效应在外部人群中存在异质性。所提出的方法在每个外部人群中生成合成的因变量数据,使用堆叠多重插补来创建具有完整协变量信息的长数据集。通过加权回归对堆叠插补数据进行最终分析。这种灵活统一的方法可以提高内部研究中估计系数的统计效率,通过利用来自仅使用内部研究中完整预测变量子集的模型的部分信息来提高预测能力,并为与内部人群的协变量效应可能不同的外部人群提供统计推断。

相似文献

1
A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.一种综合数据集成框架,用于利用来自异构人群的外部汇总级信息。
Biometrics. 2023 Dec;79(4):3831-3845. doi: 10.1111/biom.13852. Epub 2023 Apr 4.
2
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
3
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
4
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
5
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
6
Short-Term Memory Impairment短期记忆障碍
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
9
Sexual Harassment and Prevention Training性骚扰与预防培训
10
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

引用本文的文献

1
Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis.医疗保健中的联邦学习:结构化数据分析的工程方法与统计方法的基准比较
Health Data Sci. 2024 Dec 4;4:0196. doi: 10.34133/hds.0196. eCollection 2024.
2
Likelihood adaptively incorporated external aggregate information with uncertainty for survival data.对生存数据的不确定性进行适应性整合外部聚集信息。
Biometrics. 2024 Oct 3;80(4). doi: 10.1093/biomtc/ujae120.
3
Federated and distributed learning applications for electronic health records and structured medical data: a scoping review.

本文引用的文献

1
A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources.一种基于树的模型平均方法,用于从异构数据源估计个性化治疗效果。
Proc Mach Learn Res. 2022 Jul;162:21013-21036.
2
COMMUTE: Communication-efficient transfer learning for multi-site risk prediction.通勤:面向多站点风险预测的通信高效迁移学习。
J Biomed Inform. 2023 Jan;137:104243. doi: 10.1016/j.jbi.2022.104243. Epub 2022 Nov 18.
3
Accounting for not-at-random missingness through imputation stacking.通过插补堆叠来处理非随机缺失。
联邦学习和分布式学习在电子健康记录和结构化医疗数据中的应用:范围综述。
J Am Med Inform Assoc. 2023 Nov 17;30(12):2041-2049. doi: 10.1093/jamia/ocad170.
Stat Med. 2021 Nov 30;40(27):6118-6132. doi: 10.1002/sim.9174. Epub 2021 Aug 29.
4
A meta-inference framework to integrate multiple external models into a current study.一种元推断框架,可将多个外部模型集成到当前研究中。
Biostatistics. 2023 Apr 14;24(2):406-424. doi: 10.1093/biostatistics/kxab017.
5
Combining Multiple Observational Data Sources to Estimate Causal Effects.结合多个观测数据源以估计因果效应。
J Am Stat Assoc. 2020;115(531):1540-1554. doi: 10.1080/01621459.2019.1609973. Epub 2019 Jun 11.
6
A stacked approach for chained equations multiple imputation incorporating the substantive model.一种结合实质性模型的链式方程多重插补的堆叠方法。
Biometrics. 2021 Dec;77(4):1342-1354. doi: 10.1111/biom.13372. Epub 2020 Oct 5.
7
Combining primary cohort data with external aggregate information without assuming comparability.将主要队列数据与外部汇总信息相结合,而无需假设可比性。
Biometrics. 2021 Sep;77(3):1024-1036. doi: 10.1111/biom.13356. Epub 2020 Aug 25.
8
Synthetic data method to incorporate external information into a current study.将外部信息纳入当前研究的合成数据方法。
Can J Stat. 2019 Dec;47(4):580-603. doi: 10.1002/cjs.11513. Epub 2019 Jun 26.
9
Generalized meta-analysis for multiple regression models across studies with disparate covariate information.针对具有不同协变量信息的多项研究的多元回归模型进行广义荟萃分析。
Biometrika. 2019 Sep;106(3):567-585. doi: 10.1093/biomet/asz030. Epub 2019 Jul 13.
10
Empirical Bayes Estimation and Prediction Using Summary-Level Information From External Big Data Sources Adjusting for Violations of Transportability.使用来自外部大数据源的汇总级信息进行经验贝叶斯估计和预测,并针对可移植性违规进行调整。
Stat Biosci. 2018 Dec;10(3):568-586. doi: 10.1007/s12561-018-9217-4. Epub 2018 May 14.