Van Hoorde K, Van Huffel S, Timmerman D, Bourne T, Van Calster B
KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, Leuven, Belgium; KU Leuven, iMinds Medical Information Technologies, Leuven, Belgium.
Department of Obstetrics & Gynecology, University Hospitals Leuven, Leuven, Belgium; KU Leuven, Department of Development & Regeneration, Leuven, Belgium.
J Biomed Inform. 2015 Apr;54:283-93. doi: 10.1016/j.jbi.2014.12.016. Epub 2015 Jan 9.
When validating risk models (or probabilistic classifiers), calibration is often overlooked. Calibration refers to the reliability of the predicted risks, i.e. whether the predicted risks correspond to observed probabilities. In medical applications this is important because treatment decisions often rely on the estimated risk of disease. The aim of this paper is to present generic tools to assess the calibration of multiclass risk models. We describe a calibration framework based on a vector spline multinomial logistic regression model. This framework can be used to generate calibration plots and calculate the estimated calibration index (ECI) to quantify lack of calibration. We illustrate these tools in relation to risk models used to characterize ovarian tumors. The outcome of the study is the surgical stage of the tumor when relevant and the final histological outcome, which is divided into five classes: benign, borderline malignant, stage I, stage II-IV, and secondary metastatic cancer. The 5909 patients included in the study are randomly split into equally large training and test sets. We developed and tested models using the following algorithms: logistic regression, support vector machines, k nearest neighbors, random forest, naive Bayes and nearest shrunken centroids. Multiclass calibration plots are interesting as an approach to visualizing the reliability of predicted risks. The ECI is a convenient tool for comparing models, but is less informative and interpretable than calibration plots. In our case study, logistic regression and random forest showed the highest degree of calibration, and the naive Bayes the lowest.
在验证风险模型(或概率分类器)时,校准常常被忽视。校准指的是预测风险的可靠性,即预测风险是否与观察到的概率相符。在医学应用中,这一点很重要,因为治疗决策通常依赖于疾病的估计风险。本文的目的是介绍评估多类风险模型校准的通用工具。我们描述了一个基于向量样条多项逻辑回归模型的校准框架。该框架可用于生成校准图,并计算估计校准指数(ECI)以量化校准不足。我们结合用于表征卵巢肿瘤的风险模型来说明这些工具。研究结果是相关情况下肿瘤的手术分期以及最终的组织学结果,其分为五类:良性、交界性恶性、I期、II - IV期和继发性转移性癌。纳入研究的5909名患者被随机分成同样大小的训练集和测试集。我们使用以下算法开发并测试了模型:逻辑回归、支持向量机、k近邻、随机森林、朴素贝叶斯和最近收缩质心。多类校准图作为一种可视化预测风险可靠性的方法很有意思。ECI是比较模型的便捷工具,但比起校准图,其信息量和可解释性较差。在我们的案例研究中,逻辑回归和随机森林显示出最高程度的校准,而朴素贝叶斯最低。