Valizadegan Hamed, Nguyen Quang, Hauskrecht Milos
Department of Computer Science, University of Pittsburgh, USA.
AMIA Annu Symp Proc. 2012;2012:921-30. Epub 2012 Nov 3.
Building classification models from clinical data often requires labeling examples by human experts. However, it is difficult to obtain a perfect set of labels everyone agrees on because medical data are typically very complicated and it is quite common that different experts have different opinions on the same patient data. A solution that has been recently explored by the research community is learning from multiple experts/annotators. The objective of learning from multiple experts is to model different characteristics of the human experts and combine them to obtain a consensus model. In this work, we study and develop a new probabilistic approach for learning classification models from labels provided by multiple experts. Our method explicitly models and incorporates three characteristics of annotators into the learning process: their specific prediction model, consistency and bias. We show that in addition to building a superior classification model, our method also helps to model behavior of annotators. We applied the proposed method to learn different characteristics of Physicians labeling clinical records for Heparin Induced Thrombocytopenia (HIT) and combine them in order to obtain a final classifier.
从临床数据构建分类模型通常需要人类专家对示例进行标注。然而,很难获得一组让所有人都认同的完美标签,因为医学数据通常非常复杂,不同专家对同一患者数据持有不同意见的情况相当常见。研究界最近探索的一种解决方案是向多位专家/注释者学习。向多位专家学习的目标是对人类专家的不同特征进行建模,并将它们结合起来以获得一个共识模型。在这项工作中,我们研究并开发了一种新的概率方法,用于从多位专家提供的标签中学习分类模型。我们的方法明确地对注释者的三个特征进行建模,并将其纳入学习过程:他们的特定预测模型、一致性和偏差。我们表明,除了构建一个 superior 分类模型外,我们的方法还有助于对注释者的行为进行建模。我们应用所提出的方法来学习为肝素诱导的血小板减少症(HIT)标注临床记录的内科医生的不同特征,并将它们结合起来以获得最终的分类器。