Dipartimento di Matematica e Geoscienze, Università degli Studi di Trieste, Trieste, Italy.
Dipartimento di Area Medica, Istituto di Radiologia, Ospedale S. Maria della Misericordia, Università degli Studi di Udine, Udine, Italy.
Med Biol Eng Comput. 2020 Dec;58(12):3089-3099. doi: 10.1007/s11517-020-02261-2. Epub 2020 Nov 3.
Agreement measures are useful tools to both compare different evaluations of the same diagnostic outcomes and validate new rating systems or devices. Cohen's kappa (κ) certainly is the most popular agreement method between two raters, and proved its effectiveness in the last sixty years. In spite of that, this method suffers from some alleged issues, which have been highlighted since the 1970s; moreover, its value is strongly dependent on the prevalence of the disease in the considered sample. This work introduces a new agreement index, the informational agreement (IA), which seems to avoid some of Cohen's kappa's flaws, and separates the contribution of the prevalence from the nucleus of agreement. These goals are achieved by modelling the agreement-in both dichotomous and multivalue ordered-categorical cases-as the information shared between two raters through the virtual diagnostic channel connecting them: the more information exchanged between the raters, the higher their agreement. In order to test its fair behaviour and the effectiveness of the method, IA has been tested on some cases known to be problematic for κ, in the machine learning context and in a clinical scenario to compare ultrasound (US) and automated breast volume scanner (ABVS) in the setting of breast cancer imaging. Graphical Abstract To evaluate the agreement between the two raters [Formula: see text] and [Formula: see text] we create an agreement channel, based on Shannon Information Theory, that directly connects the random variables X and Y, that express the raters outcomes. They are the terminals of the chain X⇔ diagnostic test performed by [Formula: see text] ⇔ patient condition[Formula: see text] ⇔ diagnostic test performed by [Formula: see text] ⇔ Y.
一致性度量是比较同一诊断结果的不同评估以及验证新的评分系统或设备的有用工具。Cohen's kappa(κ)无疑是两位评估者之间最常用的一致性方法,并且在过去的六十年中已经证明了其有效性。尽管如此,自 20 世纪 70 年代以来,该方法一直存在一些被认为的问题;此外,其值强烈依赖于所考虑样本中疾病的流行程度。这项工作引入了一种新的一致性指标,信息一致性(IA),它似乎避免了 Cohen's kappa 的一些缺陷,并将疾病的流行率与一致性核心区分开来。通过对一致性的建模(无论是二分类还是多值有序分类),作为两个评估者通过连接他们的虚拟诊断通道共享的信息,实现了这些目标:评估者之间交换的信息越多,他们的一致性就越高。为了测试其公平性和方法的有效性,IA 在机器学习背景和临床场景中对一些已知对κ有问题的情况进行了测试,以比较乳腺癌成像中超声(US)和自动乳腺容积扫描仪(ABVS)。