Suppr超能文献

内部-外部交叉验证有助于评估大型聚类数据集预测模型的泛化能力。

Internal-external cross-validation helped to evaluate the generalizability of prediction models in large clustered datasets.

机构信息

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.

Health Data Research UK and Institute of Health Informatics, University College London, Gibbs Building, 215 Euston Road, London, NW1 2BE, United Kingdom; The Alan Turing Institute, British Library, 96 Euston Road, London, NW1 2DB, United Kingdom; The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, Suite A, 1(st) floor, Maple House, 149 Tottenham Court Road, London, W1T 7DN, United Kingdom; British Heart Foundation Research Accelerator, University College London, Gower Street, London, WC1E 6BT, United Kingdom.

出版信息

J Clin Epidemiol. 2021 Sep;137:83-91. doi: 10.1016/j.jclinepi.2021.03.025. Epub 2021 Apr 6.

Abstract

OBJECTIVE

To illustrate how to evaluate the need of complex strategies for developing generalizable prediction models in large clustered datasets.

STUDY DESIGN AND SETTING

We developed eight Cox regression models to estimate the risk of heart failure using a large population-level dataset. These models differed in the number of predictors, the functional form of the predictor effects (non-linear effects and interaction) and the estimation method (maximum likelihood and penalization). Internal-external cross-validation was used to evaluate the models' generalizability across the included general practices.

RESULTS

Among 871,687 individuals from 225 general practices, 43,987 (5.5%) developed heart failure during a median follow-up time of 5.8 years. For discrimination, the simplest prediction model yielded a good concordance statistic, which was not much improved by adopting complex strategies. Between-practice heterogeneity in discrimination was similar in all models. For calibration, the simplest model performed satisfactorily. Although accounting for non-linear effects and interaction slightly improved the calibration slope, it also led to more heterogeneity in the observed/expected ratio. Similar results were found in a second case study involving patients with stroke.

CONCLUSION

In large clustered datasets, prediction model studies may adopt internal-external cross-validation to evaluate the generalizability of competing models, and to identify promising modelling strategies.

摘要

目的

举例说明如何评估在大型聚类数据集开发可推广预测模型时所需的复杂策略。

研究设计和设置

我们开发了八个 Cox 回归模型,使用大型人群水平数据集来估计心力衰竭的风险。这些模型在预测因子的数量、预测因子效应的函数形式(非线性效应和交互作用)和估计方法(最大似然和惩罚)方面有所不同。内部-外部交叉验证用于评估模型在纳入的常规实践中的可推广性。

结果

在来自 225 个常规实践的 871687 个人中,43987(5.5%)人在中位数为 5.8 年的随访期间发生心力衰竭。对于判别能力,最简单的预测模型产生了良好的一致性统计量,采用复杂策略并没有显著提高。在所有模型中,实践间的判别异质性相似。对于校准,最简单的模型表现良好。虽然考虑非线性效应和交互作用略微提高了校准斜率,但也导致了观察到的/预期比率的更多异质性。在涉及中风患者的第二个案例研究中也发现了类似的结果。

结论

在大型聚类数据集中,预测模型研究可以采用内部-外部交叉验证来评估竞争模型的可推广性,并确定有前途的建模策略。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验