Li Yan
School of Mathematical Sciences, Xiamen University, Xiamen, 361005, People's Republic of China.
BMC Med Res Methodol. 2025 Jan 9;25(1):4. doi: 10.1186/s12874-025-02457-w.
To assess whether the outcome generation true model could be identified from other candidate models for clinical practice with current conventional model performance measures considering various simulation scenarios and a CVD risk prediction as exemplar.
Thousands of scenarios of true models were used to simulate clinical data, various candidate models and true models were trained on training datasets and then compared on testing datasets with 25 conventional use model performance measures. This consists of univariate simulation (179.2k simulated datasets and over 1.792 million models), multivariate simulation (728k simulated datasets and over 8.736 million models) and a CVD risk prediction case analysis.
True models had overall C statistic and 95% range of 0.67 (0.51, 0.96) across all scenarios in univariate simulation, 0.81 (0.54, 0.98) in multivariate simulation, 0.85 (0.82, 0.88) in univariate case analysis and 0.85 (0.82, 0.88) in multivariate case analysis. Measures showed very clear differences between the true model and flip-coin model, little or none differences between the true model and candidate models with extra noises, relatively small differences between the true model and proxy models missing causal predictors.
The study found the true model is not always identified as the "outperformed" model by current conventional measures for binary outcome, even though such true model is presented in the clinical data. New statistical approaches or measures should be established to identify the casual true model from proxy models, especially for those in proxy models with extra noises and/or missing causal predictors.
以心血管疾病(CVD)风险预测为例,考虑各种模拟场景,评估能否使用当前传统模型性能指标从其他候选模型中识别出临床实践中的结果生成真实模型。
使用数千种真实模型场景来模拟临床数据,在训练数据集上训练各种候选模型和真实模型,然后在测试数据集上使用25种传统使用的模型性能指标进行比较。这包括单变量模拟(17.92万个模拟数据集和超过179.2万个模型)、多变量模拟(72.8万个模拟数据集和超过873.6万个模型)以及CVD风险预测案例分析。
在单变量模拟的所有场景中,真实模型的总体C统计量和95%范围为0.67(0.51,0.96),多变量模拟中为0.81(0.54,0.98),单变量案例分析中为0.85(0.82,0.88),多变量案例分析中为0.85(0.82,0.88)。这些指标显示真实模型与抛硬币模型之间存在非常明显的差异,真实模型与带有额外噪声的候选模型之间差异很小或没有差异,真实模型与缺少因果预测因子的替代模型之间差异相对较小。
研究发现,即使临床数据中存在真实模型,当前用于二元结果的传统指标也不一定能将其识别为“表现最佳”的模型。应建立新的统计方法或指标,以便从替代模型中识别出因果真实模型,特别是对于那些带有额外噪声和/或缺少因果预测因子的替代模型。