Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
Health Data Research UK and Institute of Health Informatics, University College London, Gibbs Building, 215 Euston Road, London, NW1 2BE, United Kingdom; The Alan Turing Institute, British Library, 96 Euston Road, London, NW1 2DB, United Kingdom; The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, Suite A, 1(st) floor, Maple House, 149 Tottenham Court Road, London, W1T 7DN, United Kingdom; British Heart Foundation Research Accelerator, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
J Clin Epidemiol. 2021 Sep;137:83-91. doi: 10.1016/j.jclinepi.2021.03.025. Epub 2021 Apr 6.
To illustrate how to evaluate the need of complex strategies for developing generalizable prediction models in large clustered datasets.
We developed eight Cox regression models to estimate the risk of heart failure using a large population-level dataset. These models differed in the number of predictors, the functional form of the predictor effects (non-linear effects and interaction) and the estimation method (maximum likelihood and penalization). Internal-external cross-validation was used to evaluate the models' generalizability across the included general practices.
Among 871,687 individuals from 225 general practices, 43,987 (5.5%) developed heart failure during a median follow-up time of 5.8 years. For discrimination, the simplest prediction model yielded a good concordance statistic, which was not much improved by adopting complex strategies. Between-practice heterogeneity in discrimination was similar in all models. For calibration, the simplest model performed satisfactorily. Although accounting for non-linear effects and interaction slightly improved the calibration slope, it also led to more heterogeneity in the observed/expected ratio. Similar results were found in a second case study involving patients with stroke.
In large clustered datasets, prediction model studies may adopt internal-external cross-validation to evaluate the generalizability of competing models, and to identify promising modelling strategies.
举例说明如何评估在大型聚类数据集开发可推广预测模型时所需的复杂策略。
我们开发了八个 Cox 回归模型,使用大型人群水平数据集来估计心力衰竭的风险。这些模型在预测因子的数量、预测因子效应的函数形式(非线性效应和交互作用)和估计方法(最大似然和惩罚)方面有所不同。内部-外部交叉验证用于评估模型在纳入的常规实践中的可推广性。
在来自 225 个常规实践的 871687 个人中,43987(5.5%)人在中位数为 5.8 年的随访期间发生心力衰竭。对于判别能力,最简单的预测模型产生了良好的一致性统计量,采用复杂策略并没有显著提高。在所有模型中,实践间的判别异质性相似。对于校准,最简单的模型表现良好。虽然考虑非线性效应和交互作用略微提高了校准斜率,但也导致了观察到的/预期比率的更多异质性。在涉及中风患者的第二个案例研究中也发现了类似的结果。
在大型聚类数据集中,预测模型研究可以采用内部-外部交叉验证来评估竞争模型的可推广性,并确定有前途的建模策略。