• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

广义线性模型下因变量依赖抽样设计的模型误设与稳健分析。

Model misspecification and robust analysis for outcome-dependent sampling designs under generalized linear models.

机构信息

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.

Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

出版信息

Stat Med. 2023 Apr 30;42(9):1338-1352. doi: 10.1002/sim.9673. Epub 2023 Feb 9.

DOI:10.1002/sim.9673
PMID:36757145
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10883476/
Abstract

Outcome-dependent sampling (ODS) is a commonly used class of sampling designs to increase estimation efficiency in settings where response information (and possibly adjuster covariates) is available, but the exposure is expensive and/or cumbersome to collect. We focus on ODS within the context of a two-phase study, where in Phase One the response and adjuster covariate information is collected on a large cohort that is representative of the target population, but the expensive exposure variable is not yet measured. In Phase Two, using response information from Phase One, we selectively oversample a subset of informative subjects in whom we collect expensive exposure information. Importantly, the Phase Two sample is no longer representative, and we must use ascertainment-correcting analysis procedures for valid inferences. In this paper, we focus on likelihood-based analysis procedures, particularly a conditional-likelihood approach and a full-likelihood approach. Whereas the full-likelihood retains incomplete Phase One data for subjects not selected into Phase Two, the conditional-likelihood explicitly conditions on Phase Two sample selection (ie, it is a "complete case" analysis procedure). These designs and analysis procedures are typically implemented assuming a known, parametric model for the response distribution. However, in this paper, we approach analyses implementing a novel semi-parametric extension to generalized linear models (SPGLM) to develop likelihood-based procedures with improved robustness to misspecification of distributional assumptions. We specifically focus on the common setting where standard GLM distributional assumptions are not satisfied (eg, misspecified mean/variance relationship). We aim to provide practical design guidance and flexible tools for practitioners in these settings.

摘要

基于结果的抽样 (ODS) 是一种常用的抽样设计方法,用于在存在响应信息(可能还有调整器协变量)的情况下提高估计效率,但暴露情况昂贵且/或难以收集。我们专注于两阶段研究背景下的 ODS,在第一阶段,在具有代表性的目标人群的大样本中收集响应和调整器协变量信息,但尚未测量昂贵的暴露变量。在第二阶段,利用第一阶段的响应信息,我们有选择地对信息丰富的部分受试者进行过度抽样,在这些受试者中我们收集昂贵的暴露信息。重要的是,第二阶段的样本不再具有代表性,我们必须使用确证校正分析程序进行有效推断。在本文中,我们专注于基于似然的分析程序,特别是条件似然方法和完全似然方法。虽然完全似然为未被选入第二阶段的受试者保留了不完全的第一阶段数据,但条件似然明确条件是第二阶段样本选择(即,它是一种“完整案例”分析程序)。这些设计和分析程序通常在假设响应分布的已知、参数模型的情况下实施。然而,在本文中,我们采用广义线性模型(GLM)的新半参数扩展来实施分析,以开发具有改进的分布假设指定稳健性的基于似然的程序。我们特别关注标准 GLM 分布假设不满足的常见情况(例如,指定错误的均值/方差关系)。我们旨在为这些情况下的从业者提供实用的设计指导和灵活的工具。

相似文献

1
Model misspecification and robust analysis for outcome-dependent sampling designs under generalized linear models.广义线性模型下因变量依赖抽样设计的模型误设与稳健分析。
Stat Med. 2023 Apr 30;42(9):1338-1352. doi: 10.1002/sim.9673. Epub 2023 Feb 9.
2
Generalized case-control sampling under generalized linear models.广义线性模型下的广义病例对照抽样。
Biometrics. 2023 Mar;79(1):332-343. doi: 10.1111/biom.13571. Epub 2021 Oct 12.
3
Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data.两波两阶段基于结果的抽样设计及其在纵向二分类数据中的应用。
Stat Med. 2021 Apr 15;40(8):1863-1876. doi: 10.1002/sim.8876. Epub 2021 Jan 13.
4
On outcome-dependent sampling designs for longitudinal binary response data with time-varying covariates.关于具有时变协变量的纵向二元响应数据的基于结果的抽样设计。
Biostatistics. 2008 Oct;9(4):735-49. doi: 10.1093/biostatistics/kxn006. Epub 2008 Mar 27.
5
Two-Phase, Generalized Case-Control Designs for the Study of Quantitative Longitudinal Outcomes.两阶段广义病例对照设计在定量纵向结局研究中的应用。
Am J Epidemiol. 2020 Feb 28;189(2):81-90. doi: 10.1093/aje/kwz127.
6
Likelihood-based analysis of longitudinal data from outcome-related sampling designs.基于似然法对来自与结果相关抽样设计的纵向数据进行分析。
Biometrics. 2014 Mar;70(1):44-52. doi: 10.1111/biom.12108. Epub 2013 Nov 21.
7
Fully parametric and semi-parametric regression models for common events with covariate measurement error in main study/validation study designs.在主要研究/验证性研究设计中针对具有协变量测量误差的常见事件的全参数和半参数回归模型。
Biometrics. 1997 Jun;53(2):395-409.
8
Exposure enriched outcome dependent designs for longitudinal studies of gene-environment interaction.为基因-环境相互作用的纵向研究设计暴露丰富的结果依赖型研究方案。
Stat Med. 2017 Aug 15;36(18):2947-2960. doi: 10.1002/sim.7332. Epub 2017 May 11.
9
The effect of misspecification of random effects distributions in clustered data settings with outcome-dependent sampling.在具有结果依赖抽样的聚类数据设置中,随机效应分布误设的影响。
Can J Stat. 2011 Sep 1;39(3):488-497. doi: 10.1002/cjs.10117. Epub 2011 Jul 27.
10
Two-phase outcome-dependent studies for failure times and testing for effects of expensive covariates.针对失效时间的两阶段结果依赖型研究以及昂贵协变量效应的检验。
Lifetime Data Anal. 2018 Jan;24(1):28-44. doi: 10.1007/s10985-016-9386-8. Epub 2016 Nov 29.

引用本文的文献

1
Dir-GLM: A Bayesian GLM With Data-Driven Reference Distribution.Dir-GLM:一种具有数据驱动参考分布的贝叶斯广义线性模型。
Stat Med. 2025 Feb 28;44(5):e10305. doi: 10.1002/sim.10305.
2
Applying survey weights to ordinal regression models for improved inference in outcome-dependent samples with ordinal outcomes.应用调查权重于有序回归模型,以改善在具有有序结果的因变量样本中的推断。
Stat Methods Med Res. 2024 Nov;33(11-12):2007-2026. doi: 10.1177/09622802241282091. Epub 2024 Oct 23.

本文引用的文献

1
Generalized case-control sampling under generalized linear models.广义线性模型下的广义病例对照抽样。
Biometrics. 2023 Mar;79(1):332-343. doi: 10.1111/biom.13571. Epub 2021 Oct 12.
2
Semiparametric Generalized Linear Models with the gldrm Package.使用gldrm包的半参数广义线性模型。
R J. 2018 Jul;10(1):288-307.
3
Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies.两阶段抽样下的高效半参数推断及其在基因关联研究中的应用
J Am Stat Assoc. 2017;112(520):1468-1476. doi: 10.1080/01621459.2017.1295864. Epub 2017 Feb 28.
4
Generalized linear models with unspecified reference distribution.具有未指定参考分布的广义线性模型。
Biostatistics. 2009 Apr;10(2):205-18. doi: 10.1093/biostatistics/kxn030. Epub 2008 Sep 29.
5
Case-control study of human papillomavirus and oropharyngeal cancer.人乳头瘤病毒与口咽癌的病例对照研究
N Engl J Med. 2007 May 10;356(19):1944-56. doi: 10.1056/NEJMoa065497.
6
A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix.一种从临床数据估算比较率的方法;在肺癌、乳腺癌和宫颈癌中的应用。
J Natl Cancer Inst. 1951 Jun;11(6):1269-75.
7
Case-control study of bladder cancer and drinking water arsenic in the western United States.美国西部膀胱癌与饮用水中砷的病例对照研究。
Am J Epidemiol. 2003 Dec 15;158(12):1193-201. doi: 10.1093/aje/kwg281.
8
Statistics in epidemiology: the case-control study.流行病学中的统计学:病例对照研究。
J Am Stat Assoc. 1996 Mar;91(433):14-28. doi: 10.1080/01621459.1996.10476660.
9
Asset and Health Dynamics Among the Oldest Old: an overview of the AHEAD Study.高龄老人的资产与健康动态:“老年健康与经济状况前瞻性调查”(AHEAD Study)综述
J Gerontol B Psychol Sci Soc Sci. 1997 May;52 Spec No:1-20. doi: 10.1093/geronb/52b.special_issue.1.
10
Classics in oncology. Cancer studies in Massachusetts. 2. Habits, characteristics and environment of individuals with and without cancer.肿瘤学经典著作。马萨诸塞州的癌症研究。2. 患癌与未患癌个体的习惯、特征及环境。
CA Cancer J Clin. 1980 Mar-Apr;30(2):115-22. doi: 10.3322/canjclin.30.2.115.