Connolly Brian, Cohen K Bretonnel, Santel Daniel, Bayram Ulya, Pestian John
Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Ave., MLC 7024, Cincinnati, OH, 45229-3039, USA.
Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA.
BMC Bioinformatics. 2017 Aug 7;18(1):361. doi: 10.1186/s12859-017-1736-3.
Probabilistic assessments of clinical care are essential for quality care. Yet, machine learning, which supports this care process has been limited to categorical results. To maximize its usefulness, it is important to find novel approaches that calibrate the ML output with a likelihood scale. Current state-of-the-art calibration methods are generally accurate and applicable to many ML models, but improved granularity and accuracy of such methods would increase the information available for clinical decision making. This novel non-parametric Bayesian approach is demonstrated on a variety of data sets, including simulated classifier outputs, biomedical data sets from the University of California, Irvine (UCI) Machine Learning Repository, and a clinical data set built to determine suicide risk from the language of emergency department patients.
The method is first demonstrated on support-vector machine (SVM) models, which generally produce well-behaved, well understood scores. The method produces calibrations that are comparable to the state-of-the-art Bayesian Binning in Quantiles (BBQ) method when the SVM models are able to effectively separate cases and controls. However, as the SVM models' ability to discriminate classes decreases, our approach yields more granular and dynamic calibrated probabilities comparing to the BBQ method. Improvements in granularity and range are even more dramatic when the discrimination between the classes is artificially degraded by replacing the SVM model with an ad hoc k-means classifier.
The method allows both clinicians and patients to have a more nuanced view of the output of an ML model, allowing better decision making. The method is demonstrated on simulated data, various biomedical data sets and a clinical data set, to which diverse ML methods are applied. Trivially extending the method to (non-ML) clinical scores is also discussed.
临床护理的概率评估对于优质护理至关重要。然而,支持这一护理过程的机器学习一直局限于分类结果。为了最大限度地发挥其效用,找到用似然量表校准机器学习输出的新方法很重要。当前最先进的校准方法通常准确且适用于许多机器学习模型,但提高此类方法的粒度和准确性将增加可用于临床决策的信息。这种新颖的非参数贝叶斯方法在各种数据集上得到了验证,包括模拟分类器输出、来自加利福尼亚大学欧文分校(UCI)机器学习库的生物医学数据集,以及一个通过急诊科患者的语言来确定自杀风险的临床数据集。
该方法首先在支持向量机(SVM)模型上进行了验证,这些模型通常会产生表现良好、易于理解的分数。当支持向量机模型能够有效区分病例和对照时,该方法产生的校准结果与最先进的贝叶斯分位数装箱(BBQ)方法相当。然而,随着支持向量机模型区分类别的能力下降,与BBQ方法相比,我们的方法产生的校准概率更具粒度且更动态。当用临时的k均值分类器替换支持向量机模型人为降低类别之间的区分度时,粒度和范围的改善更为显著。
该方法使临床医生和患者对机器学习模型的输出有更细致入微的看法,从而实现更好的决策。该方法在模拟数据、各种生物医学数据集和一个临床数据集上得到了验证,这些数据集应用了多种机器学习方法。还讨论了将该方法简单扩展到(非机器学习)临床评分的情况。