Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Kelvin Grove, Queensland, Australia.
Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK.
BMC Med. 2023 Sep 4;21(1):339. doi: 10.1186/s12916-023-03048-6.
Clinical prediction models are widely used in health and medical research. The area under the receiver operating characteristic curve (AUC) is a frequently used estimate to describe the discriminatory ability of a clinical prediction model. The AUC is often interpreted relative to thresholds, with "good" or "excellent" models defined at 0.7, 0.8 or 0.9. These thresholds may create targets that result in "hacking", where researchers are motivated to re-analyse their data until they achieve a "good" result.
We extracted AUC values from PubMed abstracts to look for evidence of hacking. We used histograms of the AUC values in bins of size 0.01 and compared the observed distribution to a smooth distribution from a spline.
The distribution of 306,888 AUC values showed clear excesses above the thresholds of 0.7, 0.8 and 0.9 and shortfalls below the thresholds.
The AUCs for some models are over-inflated, which risks exposing patients to sub-optimal clinical decision-making. Greater modelling transparency is needed, including published protocols, and data and code sharing.
临床预测模型在健康和医学研究中被广泛应用。受试者工作特征曲线下面积(AUC)是用于描述临床预测模型判别能力的常用指标。AUC 通常是相对于阈值进行解释的,将 AUC 值在 0.7、0.8 或 0.9 之间的模型定义为“良好”或“优秀”。这些阈值可能会产生“黑客”行为,即研究人员为了获得“良好”的结果而重新分析数据。
我们从 PubMed 摘要中提取 AUC 值,以寻找黑客行为的证据。我们使用 AUC 值的直方图,每个 bin 的大小为 0.01,并将观察到的分布与样条平滑分布进行比较。
306888 个 AUC 值的分布显示出明显高于 0.7、0.8 和 0.9 阈值的超额值,以及低于阈值的不足值。
一些模型的 AUC 值被高估了,这可能会使患者面临次优的临床决策。需要提高模型透明度,包括发布协议、数据和代码共享。