• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用来自外部大数据源的汇总级信息进行经验贝叶斯估计和预测,并针对可移植性违规进行调整。

Empirical Bayes Estimation and Prediction Using Summary-Level Information From External Big Data Sources Adjusting for Violations of Transportability.

作者信息

Estes Jason P, Mukherjee Bhramar, Taylor Jeremy M G

机构信息

University of Michigan, MI 48109, USA.

出版信息

Stat Biosci. 2018 Dec;10(3):568-586. doi: 10.1007/s12561-018-9217-4. Epub 2018 May 14.

DOI:10.1007/s12561-018-9217-4
PMID:31123532
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6529204/
Abstract

Large external data sources may be available to augment studies that collect data to address a specific research objective. In this article we consider the problem of building regression models for prediction based on individual-level data from an "internal" study while incorporating summary information from an "external" big data source. We extend the work of Chatterjee et al (2016a) by introducing an adaptive empirical Bayes shrinkage estimator that uses the external summary-level information and the internal data to trade bias with variance for protection against departures in the conditional probability distribution of the outcome given a set of covariates between the two populations. We use simulation studies and a real data application using external summary information from the Prostate Cancer Prevention Trial to assess the performance of the proposed methods in contrast to maximum likelihood estimation and the constrained maximum likelihood (CML) method developed by Chatterjee et al (2016a). Our simulation studies show that the CML method can be biased and inefficient when the assumption of a transportable covariate distribution between the external and internal populations is violated, and our empirical Bayes estimator provides protection against bias and loss of efficiency.

摘要

大型外部数据源可用于扩充为实现特定研究目标而收集数据的研究。在本文中,我们考虑基于“内部”研究的个体层面数据构建预测回归模型的问题,同时纳入来自“外部”大数据源的汇总信息。我们扩展了Chatterjee等人(2016a)的工作,引入了一种自适应经验贝叶斯收缩估计器,该估计器使用外部汇总层面信息和内部数据来权衡偏差与方差,以防止在给定一组协变量的情况下,两个总体之间结果的条件概率分布出现偏差。我们使用模拟研究和一个实际数据应用,利用来自前列腺癌预防试验的外部汇总信息,来评估所提出方法相对于最大似然估计以及Chatterjee等人(2016a)开发的约束最大似然(CML)方法的性能。我们的模拟研究表明,当外部和内部总体之间可迁移协变量分布的假设被违反时,CML方法可能会产生偏差且效率低下,而我们的经验贝叶斯估计器可防止偏差和效率损失

相似文献

1
Empirical Bayes Estimation and Prediction Using Summary-Level Information From External Big Data Sources Adjusting for Violations of Transportability.使用来自外部大数据源的汇总级信息进行经验贝叶斯估计和预测,并针对可移植性违规进行调整。
Stat Biosci. 2018 Dec;10(3):568-586. doi: 10.1007/s12561-018-9217-4. Epub 2018 May 14.
2
Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources.利用来自外部大数据源的汇总级信息进行模型校准的约束最大似然估计。
J Am Stat Assoc. 2016 Mar;111(513):107-117. doi: 10.1080/01621459.2015.1123157. Epub 2016 May 5.
3
Exploiting gene-environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency.利用基因-环境独立性进行病例对照研究分析:一种在偏差和效率之间进行权衡的经验贝叶斯型收缩估计器。
Biometrics. 2008 Sep;64(3):685-694. doi: 10.1111/j.1541-0420.2007.00953.x. Epub 2007 Dec 20.
4
A meta-inference framework to integrate multiple external models into a current study.一种元推断框架,可将多个外部模型集成到当前研究中。
Biostatistics. 2023 Apr 14;24(2):406-424. doi: 10.1093/biostatistics/kxab017.
5
Semiparametric estimation of the transformation model by leveraging external aggregate data in the presence of population heterogeneity.利用群体异质性下外部聚合数据对半参数变换模型进行估计。
Biometrics. 2023 Sep;79(3):1996-2009. doi: 10.1111/biom.13778. Epub 2022 Nov 10.
6
Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach.在存在群体异质性的情况下综合外部聚合信息:一种惩罚经验似然方法。
Biometrics. 2022 Jun;78(2):679-690. doi: 10.1111/biom.13429. Epub 2021 Feb 11.
7
Empirical Bayes Gaussian likelihood estimation of exposure distributions from pooled samples in human biomonitoring.基于经验贝叶斯高斯似然估计的人类生物监测中混合样本暴露分布研究
Stat Med. 2014 Dec 10;33(28):4999-5014. doi: 10.1002/sim.6304. Epub 2014 Sep 12.
8
Bias correction for multiple covariate analysis using empirical bayesian estimation in mixed-effects models for longitudinal data.使用纵向数据混合效应模型中的经验贝叶斯估计对多协变量分析进行偏差校正。
Comput Biol Chem. 2022 Aug;99:107697. doi: 10.1016/j.compbiolchem.2022.107697. Epub 2022 May 23.
9
Simultaneous selection and incorporation of consistent external aggregate information.同时选择和整合一致的外部聚合信息。
Stat Med. 2023 Dec 30;42(30):5630-5645. doi: 10.1002/sim.9929. Epub 2023 Oct 3.
10
Integrating external summary information in the presence of prior probability shift: an application to assessing essential hypertension.在存在先验概率转移的情况下整合外部摘要信息:在评估原发性高血压中的应用。
Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae090.

引用本文的文献

1
A Weighted Survival Regression Framework for Incorporating External Prediction Information.一种用于纳入外部预测信息的加权生存回归框架。
J Stat Theory Pract. 2025;19(4):61. doi: 10.1007/s42519-025-00471-1. Epub 2025 Jul 25.
2
Robust angle-based transfer learning in high dimensions.高维空间中基于稳健角度的迁移学习
J R Stat Soc Series B Stat Methodol. 2024 Dec 3;87(3):723-745. doi: 10.1093/jrsssb/qkae111. eCollection 2025 Jul.
3
A comparison of some existing and novel methods for integrating historical models to improve estimation of coefficients in logistic regression.

本文引用的文献

1
Comparison of approaches for incorporating new information into existing risk prediction models.将新信息纳入现有风险预测模型的方法比较。
Stat Med. 2017 Mar 30;36(7):1134-1156. doi: 10.1002/sim.7190. Epub 2016 Dec 11.
2
Comment: Addressing the Need for Portability in Big Data Model Building and Calibration.评论:满足大数据模型构建与校准中对可移植性的需求。
J Am Stat Assoc. 2016;111(513):127-129. doi: 10.1080/01621459.2016.1149406. Epub 2016 May 5.
3
Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources.
一些现有和新颖方法在整合历史模型以改进逻辑回归系数估计方面的比较。
J R Stat Soc Ser A Stat Soc. 2024 Sep 24;188(1):46-67. doi: 10.1093/jrsssa/qnae093. eCollection 2025 Jan.
4
Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators.通过整合来自异质群体的外部信息来改进线性回归模型的预测:詹姆斯-斯廷(James-Stein)估计量。
Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae072.
5
Robust data integration from multiple external sources for generalized linear models with binary outcomes.稳健地整合来自多个外部数据源的信息,以用于二元结局的广义线性模型。
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad005.
6
Integrating Information from Existing Risk Prediction Models with No Model Details.整合来自现有风险预测模型的信息且无模型细节。
Can J Stat. 2023 Jun;51(2):355-374. doi: 10.1002/cjs.11701. Epub 2022 Apr 15.
7
A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.一种综合数据集成框架,用于利用来自异构人群的外部汇总级信息。
Biometrics. 2023 Dec;79(4):3831-3845. doi: 10.1111/biom.13852. Epub 2023 Apr 4.
8
Data integration: exploiting ratios of parameter estimates from a reduced external model.数据整合:利用简化外部模型中参数估计值的比率
Biometrika. 2022 Apr 12;110(1):119-134. doi: 10.1093/biomet/asac022. eCollection 2023 Mar.
9
A meta-inference framework to integrate multiple external models into a current study.一种元推断框架,可将多个外部模型集成到当前研究中。
Biostatistics. 2023 Apr 14;24(2):406-424. doi: 10.1093/biostatistics/kxab017.
10
Synthetic data method to incorporate external information into a current study.将外部信息纳入当前研究的合成数据方法。
Can J Stat. 2019 Dec;47(4):580-603. doi: 10.1002/cjs.11513. Epub 2019 Jun 26.
利用来自外部大数据源的汇总级信息进行模型校准的约束最大似然估计。
J Am Stat Assoc. 2016 Mar;111(513):107-117. doi: 10.1080/01621459.2015.1123157. Epub 2016 May 5.
4
Urine TMPRSS2:ERG Plus PCA3 for Individualized Prostate Cancer Risk Assessment.尿液中TMPRSS2:ERG加PCA3用于个体化前列腺癌风险评估。
Eur Urol. 2016 Jul;70(1):45-53. doi: 10.1016/j.eururo.2015.04.039. Epub 2015 May 16.
5
Connections between survey calibration estimators and semiparametric models for incomplete data.调查校准估计量与不完全数据半参数模型之间的联系。
Int Stat Rev. 2011 Aug;79(2):200-220. doi: 10.1111/j.1751-5823.2011.00138.x.
6
Exploiting gene-environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency.利用基因-环境独立性进行病例对照研究分析:一种在偏差和效率之间进行权衡的经验贝叶斯型收缩估计器。
Biometrics. 2008 Sep;64(3):685-694. doi: 10.1111/j.1541-0420.2007.00953.x. Epub 2007 Dec 20.
7
Assessing prostate cancer risk: results from the Prostate Cancer Prevention Trial.评估前列腺癌风险:前列腺癌预防试验的结果
J Natl Cancer Inst. 2006 Apr 19;98(8):529-34. doi: 10.1093/jnci/djj131.