• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

布里尔评分并不评估诊断试验或预测模型的临床效用。

The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models.

作者信息

Assel Melissa, Sjoberg Daniel D, Vickers Andrew J

机构信息

Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA.

出版信息

Diagn Progn Res. 2017 Dec 2;1:19. doi: 10.1186/s41512-017-0020-3. eCollection 2017.

DOI:10.1186/s41512-017-0020-3
PMID:31093548
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6460786/
Abstract

BACKGROUND

A variety of statistics have been proposed as tools to help investigators assess the value of diagnostic tests or prediction models. The Brier score has been recommended on the grounds that it is a proper scoring rule that is affected by both discrimination and calibration. However, the Brier score is prevalence dependent in such a way that the rank ordering of tests or models may inappropriately vary by prevalence.

METHODS

We explored four common clinical scenarios: comparison of a highly accurate binary test with a continuous prediction model of moderate predictiveness; comparison of two binary tests where the importance of sensitivity versus specificity is inversely associated with prevalence; comparison of models and tests to default strategies of assuming that all or no patients are positive; and comparison of two models with miscalibration in opposite directions.

RESULTS

In each case, we found that the Brier score gave an inappropriate rank ordering of the tests and models. Conversely, net benefit, a decision-analytic measure, gave results that always favored the preferable test or model.

CONCLUSIONS

Brier score does not evaluate clinical value of diagnostic tests or prediction models. We advocate, as an alternative, the use of decision-analytic measures such as net benefit.

TRIAL REGISTRATION

Not applicable.

摘要

背景

已提出多种统计方法作为工具,以帮助研究人员评估诊断试验或预测模型的价值。推荐使用Brier评分,因为它是一种恰当的评分规则,受区分度和校准的影响。然而,Brier评分依赖于患病率,以至于试验或模型的排序可能会因患病率而不适当地变化。

方法

我们探讨了四种常见的临床情况:将高度准确的二元试验与预测性中等的连续预测模型进行比较;比较两种二元试验,其中敏感性与特异性的重要性与患病率呈负相关;将模型和试验与假设所有患者或无患者为阳性的默认策略进行比较;比较两个校准方向相反的模型。

结果

在每种情况下,我们发现Brier评分对试验和模型的排序都不合适。相反,净效益作为一种决策分析指标,其结果总是有利于更优的试验或模型。

结论

Brier评分不能评估诊断试验或预测模型的临床价值。作为替代方法,我们提倡使用净效益等决策分析指标。

试验注册

不适用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2196/6460786/c38238201b38/41512_2017_20_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2196/6460786/c38238201b38/41512_2017_20_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2196/6460786/c38238201b38/41512_2017_20_Fig1_HTML.jpg

相似文献

1
The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models.布里尔评分并不评估诊断试验或预测模型的临床效用。
Diagn Progn Res. 2017 Dec 2;1:19. doi: 10.1186/s41512-017-0020-3. eCollection 2017.
2
Comparison of three different tools for prediction of seminal vesicle invasion at radical prostatectomy.比较三种不同工具在根治性前列腺切除术前预测精囊侵犯的效果。
Eur Urol. 2012 Oct;62(4):590-6. doi: 10.1016/j.eururo.2012.04.022. Epub 2012 May 2.
3
Calibration of risk prediction models: impact on decision-analytic performance.风险预测模型的校准:对决策分析性能的影响。
Med Decis Making. 2015 Feb;35(2):162-9. doi: 10.1177/0272989X14547233. Epub 2014 Aug 25.
4
Does the SORG Algorithm Predict 5-year Survival in Patients with Chondrosarcoma? An External Validation.SORG 算法能否预测软骨肉瘤患者的 5 年生存率?一项外部验证。
Clin Orthop Relat Res. 2019 Oct;477(10):2296-2303. doi: 10.1097/CORR.0000000000000748.
5
Modified Brier score for evaluating prediction accuracy for binary outcomes.改良 Brier 评分用于评估二分类结局预测准确性。
Stat Methods Med Res. 2022 Dec;31(12):2287-2296. doi: 10.1177/09622802221122391. Epub 2022 Aug 29.
6
The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models.预测准确性指数:一种用于评估风险预测模型的直观指标。
Diagn Progn Res. 2018 May 4;2:7. doi: 10.1186/s41512-018-0029-2. eCollection 2018.
7
A discussion of calibration techniques for evaluating binary and categorical predictive models.关于评估二元和分类预测模型的校准技术的讨论。
Prev Vet Med. 2018 Jan 1;149:107-114. doi: 10.1016/j.prevetmed.2017.11.018. Epub 2017 Nov 24.
8
Beyond discrimination: A comparison of calibration methods and clinical usefulness of predictive models of readmission risk.超越歧视:再入院风险预测模型的校准方法和临床实用性比较。
J Biomed Inform. 2017 Dec;76:9-18. doi: 10.1016/j.jbi.2017.10.008. Epub 2017 Oct 24.
9
Are prediction models for vaginal birth after cesarean accurate?剖宫产术后阴道分娩预测模型准确吗?
Am J Obstet Gynecol. 2019 May;220(5):492.e1-492.e7. doi: 10.1016/j.ajog.2019.01.232. Epub 2019 Feb 1.
10
Predicting Prolonged Length of Hospital Stay for Peritoneal Dialysis-Treated Patients Using Stacked Generalization: Model Development and Validation Study.使用堆叠泛化预测腹膜透析治疗患者的延长住院时间:模型开发与验证研究
JMIR Med Inform. 2021 May 19;9(5):e17886. doi: 10.2196/17886.

引用本文的文献

1
Multiplex Targeted Proteomic Analysis of Cytokine Ratios for ICU Mortality in Severe COVID-19.用于重症新型冠状病毒肺炎患者重症监护病房死亡率的细胞因子比值多重靶向蛋白质组学分析
Proteomes. 2025 Aug 2;13(3):35. doi: 10.3390/proteomes13030035.
2
An explainable predictive machine learning model for axillary lymph node metastasis in breast cancer based on multimodal data: A retrospective single-center study.基于多模态数据的可解释性乳腺癌腋窝淋巴结转移预测机器学习模型:一项回顾性单中心研究。
J Transl Med. 2025 Aug 11;23(1):892. doi: 10.1186/s12967-025-06686-x.
3
Prediction of acute kidney injury in the immediate postoperative period following liver resection: a retrospective cohort study.

本文引用的文献

1
Accuracy of Zika virus disease case definition during simultaneous Dengue and Chikungunya epidemics.登革热和基孔肯雅热同时流行期间寨卡病毒病病例定义的准确性
PLoS One. 2017 Jun 26;12(6):e0179725. doi: 10.1371/journal.pone.0179725. eCollection 2017.
2
Validation of Clinical Scoring Systems ART and ABCR after Transarterial Chemoembolization of Hepatocellular Carcinoma.肝细胞癌经动脉化疗栓塞术后临床评分系统ART和ABCR的验证
J Vasc Interv Radiol. 2017 Jan;28(1):94-102. doi: 10.1016/j.jvir.2016.06.012. Epub 2016 Aug 23.
3
The Net Reclassification Index (NRI): a Misleading Measure of Prediction Improvement Even with Independent Test Data Sets.
肝切除术后即刻急性肾损伤的预测:一项回顾性队列研究。
Can J Anaesth. 2025 Jul 14. doi: 10.1007/s12630-025-02996-2.
4
Developing and validating a risk prediction model for caesarean delivery in Northwest Amhara comprehensive specialized hospitals.开发并验证阿姆哈拉西北部综合专科医院剖宫产风险预测模型。
BMC Pregnancy Childbirth. 2025 Jul 2;25(1):701. doi: 10.1186/s12884-025-07822-7.
5
An empirical assessment of differential privacy in real-world observational data: a case-control study of asthma exacerbation in UK Biobank linked with electronic health records.现实世界观察数据中差分隐私的实证评估:英国生物银行与电子健康记录关联的哮喘加重病例对照研究。
J Am Med Inform Assoc. 2025 Aug 1;32(8):1328-1339. doi: 10.1093/jamia/ocaf090.
6
Mortality Prediction Performance Under Geographical, Temporal, and COVID-19 Pandemic Dataset Shift: External Validation of the Global Open-Source Severity of Illness Score Model.地理、时间和新冠疫情数据集偏移下的死亡率预测性能:全球开源疾病严重程度评分模型的外部验证
Crit Care Explor. 2025 Jun 4;7(6):e1275. doi: 10.1097/CCE.0000000000001275. eCollection 2025 Jun 1.
7
Explainable Boosting Machines Identify Key Metabolomic Biomarkers in Rheumatoid Arthritis.可解释的增强机器识别类风湿性关节炎中的关键代谢组学生物标志物。
Medicina (Kaunas). 2025 Apr 30;61(5):833. doi: 10.3390/medicina61050833.
8
Identification and Patient Benefit Evaluation of Machine Learning Models for Predicting 90-Day Mortality After Endovascular Thrombectomy Based on Routinely Ready Clinical Information.基于常规可用临床信息的血管内血栓切除术术后90天死亡率预测机器学习模型的识别与患者获益评估
Bioengineering (Basel). 2025 Apr 28;12(5):468. doi: 10.3390/bioengineering12050468.
9
A Personalized Predictive Model That Jointly Optimizes Discrimination and Calibration.一种联合优化区分度和校准度的个性化预测模型。
Stat Med. 2025 May;44(10-12):e70077. doi: 10.1002/sim.70077.
10
Machine learning approaches for risk prediction after percutaneous coronary intervention: a systematic review and meta-analysis.经皮冠状动脉介入治疗后风险预测的机器学习方法:系统评价与荟萃分析
Eur Heart J Digit Health. 2024 Oct 14;6(1):23-44. doi: 10.1093/ehjdh/ztae074. eCollection 2025 Jan.
净重新分类指数(NRI):即使使用独立测试数据集,也是一种误导性的预测改善衡量指标。
Stat Biosci. 2015 Oct 1;7(2):282-295. doi: 10.1007/s12561-014-9118-0. Epub 2014 Aug 23.
4
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.透明报告个体预后或诊断的多变量预测模型(TRIPOD):TRIPOD 声明。
Ann Intern Med. 2015 Jan 6;162(1):55-63. doi: 10.7326/M14-0697.
5
Calibration of risk prediction models: impact on decision-analytic performance.风险预测模型的校准:对决策分析性能的影响。
Med Decis Making. 2015 Feb;35(2):162-9. doi: 10.1177/0272989X14547233. Epub 2014 Aug 25.
6
Alternative performance measures for prediction models.预测模型的替代性能指标。
PLoS One. 2014 Mar 7;9(3):e91249. doi: 10.1371/journal.pone.0091249. eCollection 2014.
7
Does the net reclassification improvement help us evaluate models and markers?净重新分类改善能否帮助我们评估模型和标志物?
Ann Intern Med. 2014 Jan 21;160(2):136-7. doi: 10.7326/M13-2841.
8
Net reclassification indices for evaluating risk prediction instruments: a critical review.用于评估风险预测工具的净重新分类指数:批判性评价。
Epidemiology. 2014 Jan;25(1):114-21. doi: 10.1097/EDE.0000000000000018.
9
A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index.关于新型生物标志物评估的一则注释:不要依赖综合判别改善和净重新分类指数。
Stat Med. 2014 Aug 30;33(19):3405-14. doi: 10.1002/sim.5804. Epub 2013 Apr 2.
10
A simple decision analytic solution to the comparison of two binary diagnostic tests.一种简单的决策分析方法,用于比较两种二项诊断测试。
Stat Med. 2013 May 20;32(11):1865-76. doi: 10.1002/sim.5601. Epub 2012 Sep 13.