临床预测模型中存在可疑研究行为的证据。

Evidence of questionable research practices in clinical prediction models.

机构信息

Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Kelvin Grove, Queensland, Australia.

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK.

出版信息

BMC Med. 2023 Sep 4;21(1):339. doi: 10.1186/s12916-023-03048-6.

DOI:10.1186/s12916-023-03048-6

PMID:37667344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10478406/

Abstract

BACKGROUND

Clinical prediction models are widely used in health and medical research. The area under the receiver operating characteristic curve (AUC) is a frequently used estimate to describe the discriminatory ability of a clinical prediction model. The AUC is often interpreted relative to thresholds, with "good" or "excellent" models defined at 0.7, 0.8 or 0.9. These thresholds may create targets that result in "hacking", where researchers are motivated to re-analyse their data until they achieve a "good" result.

METHODS

We extracted AUC values from PubMed abstracts to look for evidence of hacking. We used histograms of the AUC values in bins of size 0.01 and compared the observed distribution to a smooth distribution from a spline.

RESULTS

The distribution of 306,888 AUC values showed clear excesses above the thresholds of 0.7, 0.8 and 0.9 and shortfalls below the thresholds.

CONCLUSIONS

The AUCs for some models are over-inflated, which risks exposing patients to sub-optimal clinical decision-making. Greater modelling transparency is needed, including published protocols, and data and code sharing.

摘要

背景

临床预测模型在健康和医学研究中被广泛应用。受试者工作特征曲线下面积（AUC）是用于描述临床预测模型判别能力的常用指标。AUC 通常是相对于阈值进行解释的，将 AUC 值在 0.7、0.8 或 0.9 之间的模型定义为“良好”或“优秀”。这些阈值可能会产生“黑客”行为，即研究人员为了获得“良好”的结果而重新分析数据。

方法

我们从 PubMed 摘要中提取 AUC 值，以寻找黑客行为的证据。我们使用 AUC 值的直方图，每个 bin 的大小为 0.01，并将观察到的分布与样条平滑分布进行比较。

结果

306888 个 AUC 值的分布显示出明显高于 0.7、0.8 和 0.9 阈值的超额值，以及低于阈值的不足值。

结论

一些模型的 AUC 值被高估了，这可能会使患者面临次优的临床决策。需要提高模型透明度，包括发布协议、数据和代码共享。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4686/10478406/ddd5ab494d42/12916_2023_3048_Fig1_HTML.jpg

相似文献

Evidence of questionable research practices in clinical prediction models.

BMC Med. 2023 Sep 4;21(1):339. doi: 10.1186/s12916-023-03048-6.

ROC curves for clinical prediction models part 1. ROC plots showed no added value above the AUC when evaluating the performance of clinical prediction models.

J Clin Epidemiol. 2020 Oct;126:207-216. doi: 10.1016/j.jclinepi.2020.01.028. Epub 2020 Jul 23.

Can A Multivariate Model for Survival Estimation in Skeletal Metastases (PATHFx) Be Externally Validated Using Japanese Patients?

Clin Orthop Relat Res. 2017 Sep;475(9):2263-2270. doi: 10.1007/s11999-017-5389-3. Epub 2017 May 30.

A program for computing the prediction probability and the related receiver operating characteristic graph.

Anesth Analg. 2010 Dec;111(6):1416-21. doi: 10.1213/ANE.0b013e3181fb919e. Epub 2010 Nov 8.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Scoring system development for prediction of extravesical bladder cancer.

Vojnosanit Pregl. 2014 Sep;71(9):851-7.

Small improvement in the area under the receiver operating characteristic curve indicated small changes in predicted risks.

J Clin Epidemiol. 2016 Nov;79:159-164. doi: 10.1016/j.jclinepi.2016.07.002. Epub 2016 Jul 16.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

A global goodness-of-fit test for receiver operating characteristic curve analysis via the bootstrap method.

J Biomed Inform. 2005 Oct;38(5):395-403. doi: 10.1016/j.jbi.2005.02.004. Epub 2005 Mar 9.

引用本文的文献

Dehydrotanshinone II A alleviates osteoarthritis via activating PPARγ to inhibit ferroptosis in chondrocytes.

Sci Rep. 2025 Aug 12;15(1):29602. doi: 10.1038/s41598-025-14896-y.

Evaluating predictive performance, validity, and applicability of machine learning models for predicting HIV treatment interruption: a systematic review.

BMC Glob Public Health. 2025 Jul 24;3(1):64. doi: 10.1186/s44263-025-00184-4.

An examination of factors associated with disparities in clinical trial eligibility guided by the Socioecological Model.

Cancer. 2025 Jul 1;131(13):e35944. doi: 10.1002/cncr.35944.

Development of a herpes zoster vaccination intention scale and identification of factors associated with vaccine hesitancy among middle-aged and older attendees in community health centers: A Protection Motivation Theory based study.

Hum Vaccin Immunother. 2025 Dec;21(1):2516947. doi: 10.1080/21645515.2025.2516947. Epub 2025 Jun 16.

Serum metabolic profiling in diabetic kidney disease patients using ultra-high performance liquid chromatography-tandem mass spectrometry.

Diabetol Metab Syndr. 2025 Jun 7;17(1):197. doi: 10.1186/s13098-025-01780-y.

Can machine learning be a reliable tool for predicting hematoma progression following traumatic brain injury? A systematic review and meta-analysis.

Neuroradiology. 2025 May 21. doi: 10.1007/s00234-025-03657-3.

Mortality prediction of heart transplantation using machine learning models: a systematic review and meta-analysis.

Front Artif Intell. 2025 Apr 4;8:1551959. doi: 10.3389/frai.2025.1551959. eCollection 2025.

Bioelectrical impedance analysis for measuring body composition and predicting low muscle mass in apparently healthy pediatric outpatients: a retrospective observational study.

BMC Pediatr. 2025 Apr 16;25(1):303. doi: 10.1186/s12887-025-05579-8.

Combination of urinary biomarkers can predict cardiac surgery-associated acute kidney injury: a systematic review and meta-analysis.

Ann Intensive Care. 2025 Mar 29;15(1):45. doi: 10.1186/s13613-025-01459-7.

Advancing Diabetic Retinopathy Screening: A Systematic Review of Artificial Intelligence and Optical Coherence Tomography Angiography Innovations.

Diagnostics (Basel). 2025 Mar 15;15(6):737. doi: 10.3390/diagnostics15060737.

本文引用的文献

We must improve conditions and options for Australian ECRs.

Nat Hum Behav. 2023 Jul;7(7):1038-1041. doi: 10.1038/s41562-023-01621-w.

Systematic review finds "spin" practices and poor reporting standards in studies on machine learning-based prediction models.

J Clin Epidemiol. 2023 Jun;158:99-110. doi: 10.1016/j.jclinepi.2023.03.024. Epub 2023 Apr 5.

There is no such thing as a validated prediction model.

BMC Med. 2023 Feb 24;21(1):70. doi: 10.1186/s12916-023-02779-w.

Big little lies: a compendium and simulation of -hacking strategies.

R Soc Open Sci. 2023 Feb 8;10(2):220346. doi: 10.1098/rsos.220346. eCollection 2023 Feb.

Interpreting area under the receiver operating characteristic curve.

Lancet Digit Health. 2022 Dec;4(12):e853-e855. doi: 10.1016/S2589-7500(22)00188-1. Epub 2022 Oct 18.

Biochemical Parameters as Prognostic Markers in Severely Ill COVID-19 Patients.

Cureus. 2022 Aug 30;14(8):e28594. doi: 10.7759/cureus.28594. eCollection 2022 Aug.

Ten simple rules for good research practice.

PLoS Comput Biol. 2022 Jun 23;18(6):e1010139. doi: 10.1371/journal.pcbi.1010139. eCollection 2022 Jun.

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review.

BMC Med Res Methodol. 2022 Apr 8;22(1):101. doi: 10.1186/s12874-022-01577-x.

Clinical prediction models in psychiatry: a systematic review of two decades of progress and challenges.

Mol Psychiatry. 2022 Jun;27(6):2700-2708. doi: 10.1038/s41380-022-01528-4. Epub 2022 Apr 1.

An observational analysis of the trope "A p-value of < 0.05 was considered statistically significant" and other cut-and-paste statistical methods.

PLoS One. 2022 Mar 9;17(3):e0264360. doi: 10.1371/journal.pone.0264360. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

临床预测模型中存在可疑研究行为的证据。

Evidence of questionable research practices in clinical prediction models.

机构信息

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK.