Suppr超能文献

关于评估二元和分类预测模型的校准技术的讨论。

A discussion of calibration techniques for evaluating binary and categorical predictive models.

作者信息

Fenlon Caroline, O'Grady Luke, Doherty Michael L, Dunnion John

机构信息

School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland, Ireland.

School of Veterinary Medicine, University College Dublin, Belfield, Dublin 4, Ireland.

出版信息

Prev Vet Med. 2018 Jan 1;149:107-114. doi: 10.1016/j.prevetmed.2017.11.018. Epub 2017 Nov 24.

Abstract

Modelling of binary and categorical events is a commonly used tool to simulate epidemiological processes in veterinary research. Logistic and multinomial regression, naïve Bayes, decision trees and support vector machines are popular data mining techniques used to predict the probabilities of events with two or more outcomes. Thorough evaluation of a predictive model is important to validate its ability for use in decision-support or broader simulation modelling. Measures of discrimination, such as sensitivity, specificity and receiver operating characteristics, are commonly used to evaluate how well the model can distinguish between the possible outcomes. However, these discrimination tests cannot confirm that the predicted probabilities are accurate and without bias. This paper describes a range of calibration tests, which typically measure the accuracy of predicted probabilities by comparing them to mean event occurrence rates within groups of similar test records. These include overall goodness-of-fit statistics in the form of the Hosmer-Lemeshow and Brier tests. Visual assessment of prediction accuracy is carried out using plots of calibration and deviance (the difference between the outcome and its predicted probability). The slope and intercept of the calibration plot are compared to the perfect diagonal using the unreliability test. Mean absolute calibration error provides an estimate of the level of predictive error. This paper uses sample predictions from a binary logistic regression model to illustrate the use of calibration techniques. Code is provided to perform the tests in the R statistical programming language. The benefits and disadvantages of each test are described. Discrimination tests are useful for establishing a model's diagnostic abilities, but may not suitably assess the model's usefulness for other predictive applications, such as stochastic simulation. Calibration tests may be more informative than discrimination tests for evaluating models with a narrow range of predicted probabilities or overall prevalence close to 50%, which are common in epidemiological applications. Using a suite of calibration tests alongside discrimination tests allows model builders to thoroughly measure their model's predictive capabilities.

摘要

二元和分类事件建模是兽医研究中模拟流行病学过程的常用工具。逻辑回归和多项回归、朴素贝叶斯、决策树和支持向量机是流行的数据挖掘技术,用于预测具有两个或更多结果的事件概率。对预测模型进行全面评估对于验证其在决策支持或更广泛的模拟建模中的应用能力非常重要。判别度度量,如灵敏度、特异度和受试者工作特征曲线,通常用于评估模型区分可能结果的能力。然而,这些判别测试无法确认预测概率是否准确且无偏差。本文介绍了一系列校准测试,这些测试通常通过将预测概率与相似测试记录组内的平均事件发生率进行比较来衡量预测概率的准确性。这些测试包括以Hosmer-Lemeshow检验和Brier检验形式呈现的整体拟合优度统计量。使用校准图和偏差图(结果与其预测概率之间的差异)对预测准确性进行可视化评估。使用不可靠性检验将校准图的斜率和截距与完美对角线进行比较。平均绝对校准误差提供了预测误差水平的估计值。本文使用二元逻辑回归模型的样本预测来说明校准技术的使用。文中提供了用R统计编程语言执行这些测试的代码。描述了每种测试的优缺点。判别测试对于建立模型的诊断能力很有用,但可能无法适当地评估模型在其他预测应用(如随机模拟)中的有用性。对于评估预测概率范围窄或总体患病率接近50%的模型,校准测试可能比判别测试提供更多信息,这在流行病学应用中很常见。将一系列校准测试与判别测试一起使用,可使模型构建者全面衡量其模型的预测能力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验