使用协变量分布与目标人群不同的队列评估疾病预测模型。

Evaluating disease prediction models using a cohort whose covariate distribution differs from that of the target population.

机构信息

1 Department of Statistics, Stanford University, Stanford, CA, USA.

2 Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA.

出版信息

Stat Methods Med Res. 2019 Jan;28(1):309-320. doi: 10.1177/0962280217723945. Epub 2017 Aug 16.

DOI:10.1177/0962280217723945

PMID:28812439

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5895541/

Abstract

Personal predictive models for disease development play important roles in chronic disease prevention. The performance of these models is evaluated by applying them to the baseline covariates of participants in external cohort studies, with model predictions compared to subjects' subsequent disease incidence. However, the covariate distribution among participants in a validation cohort may differ from that of the population for which the model will be used. Since estimates of predictive model performance depend on the distribution of covariates among the subjects to which it is applied, such differences can cause misleading estimates of model performance in the target population. We propose a method for addressing this problem by weighting the cohort subjects to make their covariate distribution better match that of the target population. Simulations show that the method provides accurate estimates of model performance in the target population, while un-weighted estimates may not. We illustrate the method by applying it to evaluate an ovarian cancer prediction model targeted to US women, using cohort data from participants in the California Teachers Study. The methods can be implemented using open-source code for public use as the R-package RMAP (Risk Model Assessment Package) available at http://stanford.edu/~ggong/rmap/ .

摘要

个人疾病发展预测模型在慢性病预防中起着重要作用。这些模型的性能通过将其应用于外部队列研究参与者的基线协变量来评估，将模型预测与受试者随后的疾病发病率进行比较。然而，验证队列中参与者的协变量分布可能与模型将要使用的人群不同。由于预测模型性能的估计取决于其应用对象的协变量分布，因此这种差异可能会导致在目标人群中对模型性能的估计产生误导。我们提出了一种通过加权队列中的主体以使他们的协变量分布更好地匹配目标人群的方法来解决这个问题。模拟表明，该方法在目标人群中提供了对模型性能的准确估计，而未加权的估计可能不准确。我们通过将其应用于评估针对美国女性的卵巢癌预测模型来说明该方法，该模型使用来自加利福尼亚教师研究参与者的队列数据。该方法可以使用开源代码来实现，以公开使用，该代码作为可在 http://stanford.edu/~ggong/rmap/ 获得的 R 包 RMAP（风险模型评估包）。

相似文献

Evaluating disease prediction models using a cohort whose covariate distribution differs from that of the target population.使用协变量分布与目标人群不同的队列评估疾病预测模型。

Stat Methods Med Res. 2019 Jan;28(1):309-320. doi: 10.1177/0962280217723945. Epub 2017 Aug 16.

Assessing the goodness of fit of personal risk models.评估个人风险模型的拟合优度。

Stat Med. 2014 Aug 15;33(18):3179-90. doi: 10.1002/sim.6176. Epub 2014 Apr 22.

Two-stage sampling designs for external validation of personal risk models.用于个人风险模型外部验证的两阶段抽样设计。

Stat Methods Med Res. 2016 Aug;25(4):1313-29. doi: 10.1177/0962280213480420. Epub 2013 Apr 16.

Power, selection bias and predictive performance of the Population Pharmacokinetic Covariate Model.群体药代动力学协变量模型的效能、选择偏倚及预测性能

J Pharmacokinet Pharmacodyn. 2004 Apr;31(2):109-34. doi: 10.1023/b:jopa.0000034404.86036.72.

Estimating the area under the ROC curve when transporting a prediction model to a target population.将预测模型传输到目标人群时估计 ROC 曲线下的面积。

Biometrics. 2023 Sep;79(3):2382-2393. doi: 10.1111/biom.13796. Epub 2022 Nov 25.

Comparison of stepwise covariate model building strategies in population pharmacokinetic-pharmacodynamic analysis.群体药代动力学-药效学分析中逐步协变量模型构建策略的比较

AAPS PharmSci. 2002;4(4):E27. doi: 10.1208/ps040427.

Logistic regression of family data from retrospective study designs.回顾性研究设计中家庭数据的逻辑回归

Genet Epidemiol. 2003 Nov;25(3):177-89. doi: 10.1002/gepi.10267.

External Validation and Optimization of the SPRING Model for Prediction of Survival After Surgical Treatment of Bone Metastases of the Extremities.四肢骨转移手术治疗后生存预测的 SPRING 模型的外部验证和优化。

Clin Orthop Relat Res. 2018 Aug;476(8):1591-1599. doi: 10.1097/01.blo.0000534678.44152.ee.

A comparison of entropy balance and probability weighting methods to generalize observational cohorts to a population: a simulation and empirical example.将观察性队列推广至总体的熵平衡法与概率加权法比较：模拟与实证示例

Pharmacoepidemiol Drug Saf. 2017 Apr;26(4):368-377. doi: 10.1002/pds.4121. Epub 2016 Nov 13.

Addressing bias in prediction models by improving subpopulation calibration.通过改进子群体校准来解决预测模型中的偏差。

J Am Med Inform Assoc. 2021 Mar 1;28(3):549-558. doi: 10.1093/jamia/ocaa283.

引用本文的文献

Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection.在存在标签选择的情况下避免有偏倚的临床机器学习模型性能估计。

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:81-90. eCollection 2023.

Accommodating population differences when validating risk prediction models.在验证风险预测模型时，要考虑到人群差异。

Stat Med. 2022 Oct 30;41(24):4756-4780. doi: 10.1002/sim.9447. Epub 2022 Jul 5.

Personalized antibiograms for machine learning driven antibiotic selection.用于机器学习驱动的抗生素选择的个性化抗菌谱

Commun Med (Lond). 2022 Apr 8;2:38. doi: 10.1038/s43856-022-00094-8. eCollection 2022.

Modeling risks of cardiovascular and cancer mortality following a diagnosis of loco-regional breast cancer.局部区域性乳腺癌诊断后心血管和癌症死亡率的风险建模。

Breast Cancer Res. 2021 Sep 27;23(1):91. doi: 10.1186/s13058-021-01469-w.

Improving External Validity of Epidemiologic Cohort Analyses: A Kernel Weighting Approach.提高流行病学队列分析的外部效度：一种核加权方法。

J R Stat Soc Ser A Stat Soc. 2020 Jun;183(3):1293-1311. doi: 10.1111/rssa.12564. Epub 2020 Apr 25.

Considerations When Using Breast Cancer Risk Models for Women with Negative BRCA1/BRCA2 Mutation Results.考虑使用乳腺癌风险模型对 BRCA1/BRCA2 基因突变阴性的女性进行评估。

J Natl Cancer Inst. 2020 Apr 1;112(4):418-422. doi: 10.1093/jnci/djz194.

本文引用的文献

A closed testing procedure to select an appropriate method for updating prediction models.一种用于选择更新预测模型合适方法的封闭测试程序。

Stat Med. 2017 Dec 10;36(28):4529-4539. doi: 10.1002/sim.7179. Epub 2016 Nov 28.

A new concordance measure for risk prediction models in external validation settings.一种用于外部验证环境中风险预测模型的新一致性度量。

Stat Med. 2016 Oct 15;35(23):4136-52. doi: 10.1002/sim.6997. Epub 2016 Jun 1.

Summarising and validating test accuracy results across multiple studies for use in clinical practice.总结并验证多项研究的测试准确性结果，以供临床实践使用。

Stat Med. 2015 Jun 15;34(13):2081-103. doi: 10.1002/sim.6471. Epub 2015 Mar 20.

Predictive accuracy of novel risk factors and markers: A simulation study of the sensitivity of different performance measures for the Cox proportional hazards regression model.新型风险因素和标志物的预测准确性：对Cox比例风险回归模型不同性能指标敏感性的模拟研究

Stat Methods Med Res. 2017 Jun;26(3):1053-1077. doi: 10.1177/0962280214567141. Epub 2015 Feb 5.

A new framework to enhance the interpretation of external validation studies of clinical prediction models.一种增强临床预测模型外部验证研究解释的新框架。

J Clin Epidemiol. 2015 Mar;68(3):279-89. doi: 10.1016/j.jclinepi.2014.06.018. Epub 2014 Aug 30.

National health and nutrition examination survey: analytic guidelines, 1999-2010.国家健康与营养检查调查：分析指南，1999 - 2010年

Vital Health Stat 2. 2013 Sep(161):1-24.

National Health and Nutrition Examination Survey: sample design, 2007-2010.国家健康与营养检查调查：样本设计，2007 - 2010年

Vital Health Stat 2. 2013 Aug(160):1-23.

Assessing the goodness of fit of personal risk models.评估个人风险模型的拟合优度。

Stat Med. 2014 Aug 15;33(18):3179-90. doi: 10.1002/sim.6176. Epub 2014 Apr 22.

Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks.用于具有竞争风险的删失事件时间的接收者操作特征曲线下时间依赖面积的估计与比较。

Stat Med. 2013 Dec 30;32(30):5381-97. doi: 10.1002/sim.5958. Epub 2013 Sep 12.

Risk prediction for breast, endometrial, and ovarian cancer in white women aged 50 y or older: derivation and validation from population-based cohort studies.50 岁及以上白人女性的乳腺癌、子宫内膜癌和卵巢癌风险预测：来自基于人群的队列研究的推导和验证。

PLoS Med. 2013;10(7):e1001492. doi: 10.1371/journal.pmed.1001492. Epub 2013 Jul 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验