SimulaMet, Oslo, Norway.
Oslo Metropolitan University, Oslo, Norway.
Sci Rep. 2022 Apr 8;12(1):5979. doi: 10.1038/s41598-022-09954-8.
Clinicians and software developers need to understand how proposed machine learning (ML) models could improve patient care. No single metric captures all the desirable properties of a model, which is why several metrics are typically reported to summarize a model's performance. Unfortunately, these measures are not easily understandable by many clinicians. Moreover, comparison of models across studies in an objective manner is challenging, and no tool exists to compare models using the same performance metrics. This paper looks at previous ML studies done in gastroenterology, provides an explanation of what different metrics mean in the context of binary classification in the presented studies, and gives a thorough explanation of how different metrics should be interpreted. We also release an open source web-based tool that may be used to aid in calculating the most relevant metrics presented in this paper so that other researchers and clinicians may easily incorporate them into their research.
临床医生和软件开发人员需要了解拟议的机器学习 (ML) 模型如何能够改善患者护理。没有单一的指标可以捕捉到模型的所有理想特性,这就是为什么通常会报告多个指标来总结模型的性能。不幸的是,许多临床医生很难理解这些措施。此外,以客观的方式比较研究中的模型具有挑战性,并且没有工具可用于使用相同的性能指标来比较模型。本文着眼于之前在胃肠病学中进行的 ML 研究,解释了在提出的研究中,不同指标在二进制分类背景下的含义,并详细解释了如何解释不同的指标。我们还发布了一个开源的基于网络的工具,可用于帮助计算本文中呈现的最相关指标,以便其他研究人员和临床医生可以轻松地将其纳入他们的研究中。