Zhu Tingting, Johnson Alistair E W, Behar Joachim, Clifford Gari D
Intelligent Patient Monitoring Group, Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK,
Ann Biomed Eng. 2014 Apr;42(4):871-84. doi: 10.1007/s10439-013-0964-6. Epub 2013 Dec 25.
For medical applications, the ground truth is ascertained through manual labels by clinical experts. However, significant inter-observer variability and various human biases limit accuracy. A probabilistic framework addresses these issues by comparing aggregated human and automated labels to provide a reliable ground truth, with no prior knowledge of the individual performance. As an alternative to median or mean voting strategies, novel contextual features (signal quality and physiology) were introduced to allow the Probabilistic Label Aggregator (PLA) to weight an algorithm or human based on its performance. As a proof of concept, the PLA was applied to QT interval (pro-arrhythmic indicator) estimation from the electrocardiogram using labels from 20 humans and 48 algorithms crowd-sourced from the 2006 PhysioNet/Computing in Cardiology Challenge database. For automatic annotations, the root mean square error of the PLA was 13.97 ± 0.46 ms, significantly outperforming the best Challenge entry (16.36 ms) as well as mean and median voting strategies (17.67 ± 0.56 ms and 14.44 ± 0.52 ms respectively with p < 0.05). When selecting three annotators, the PLA improved the annotation accuracy over median aggregation by 10.7% for human annotators and 14.4% for automated algorithms. The PLA could therefore provide an improved "gold standard" for medical annotation tasks even when ground truth is not available.
对于医学应用而言,真实情况是由临床专家通过手动标注来确定的。然而,观察者之间存在显著的差异以及各种人为偏差,这限制了准确性。一个概率框架通过比较汇总的人工标注和自动标注来解决这些问题,以提供可靠的真实情况,且无需了解个体的表现。作为中位数或均值投票策略的替代方法,引入了新颖的上下文特征(信号质量和生理学特征),以使概率标签聚合器(PLA)能够根据算法或人工的表现来加权。作为概念验证,PLA被应用于从心电图估计QT间期(心律失常指标),使用了来自2006年生理网络/心脏病学计算挑战赛数据库众包的20个人的标注和48种算法。对于自动注释,PLA的均方根误差为13.97±0.46毫秒,显著优于挑战赛最佳参赛作品(16.36毫秒)以及均值和中位数投票策略(分别为17.67±0.56毫秒和14.44±0.52毫秒,p<0.05)。当选择三名注释者时,与中位数聚合相比,PLA将人工注释者的注释准确率提高了10.7%,将自动算法的注释准确率提高了14.4%。因此,即使在没有真实情况可用时,PLA也可为医学注释任务提供改进的“金标准”。