Riley Richard D, Collins Gary S, Whittle Rebecca, Archer Lucinda, Snell Kym I E, Dhiman Paula, Kirton Laura, Legha Amardeep, Liu Xiaoxuan, Denniston Alastair K, Harrell Frank E, Wynants Laure, Martin Glen P, Ensor Joie
Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK.
National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK.
Diagn Progn Res. 2025 Jul 8;9(1):14. doi: 10.1186/s41512-025-00193-9.
When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates.
We propose a decomposition of Fisher's information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed 'core model' either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors.
We produce closed-form solutions that decompose the variance of an individual's risk estimate into the Fisher's unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks.
Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.
在使用数据集开发或更新临床预测模型时,小样本量会增加对过度拟合、不稳定性、预测性能差和缺乏公平性的担忧。对于估计二元结局风险的模型,先前的研究已经概述了以低过度拟合和精确的总体风险估计为目标的样本量计算方法。然而,对于以精确和公平的个体水平风险估计为目标,还需要更多的指导。
我们提出对费舍尔信息矩阵进行分解,以帮助检查开发或更新模型所需的样本量,目标是获得精确和公平的个体水平风险估计。我们概述了一个五步流程,可在数据收集之前使用,或者在有现有数据集或试点研究时使用。它要求研究人员指定目标人群中的总体风险、模型中关键预测变量的(预期)分布以及一个假定的“核心模型”,该模型可以直接指定(即提供一个逻辑回归方程),也可以基于指定的C统计量和(标准化)预测变量的相对效应来指定。
我们得出了封闭形式的解,将个体风险估计的方差分解为费舍尔单位信息矩阵、预测变量值和总样本量。这使研究人员能够快速计算和检查指定样本量下个体水平预测和分类的预期精度。该信息可以呈现给关键利益相关者(如卫生专业人员、患者、资助者),以为前瞻性数据收集提供目标样本量信息,或判断现有数据集是否足够。我们的提议已在我们的新软件模块pmstabilityss中实现。我们提供了两个实际例子,并强调了临床背景的重要性,包括任何决策的风险阈值和公平性检查。
我们的方法有助于研究人员在开发或更新二元结局预测模型时,检查实现精确和公平的个体水平预测所需的潜在样本量。