Computer Science Department, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
J Am Med Inform Assoc. 2014 May-Jun;21(3):501-8. doi: 10.1136/amiajnl-2013-001964. Epub 2013 Nov 20.
Learning of classification models in medicine often relies on data labeled by a human expert. Since labeling of clinical data may be time-consuming, finding ways of alleviating the labeling costs is critical for our ability to automatically learn such models. In this paper we propose a new machine learning approach that is able to learn improved binary classification models more efficiently by refining the binary class information in the training phase with soft labels that reflect how strongly the human expert feels about the original class labels.
Two types of methods that can learn improved binary classification models from soft labels are proposed. The first relies on probabilistic/numeric labels, the other on ordinal categorical labels. We study and demonstrate the benefits of these methods for learning an alerting model for heparin induced thrombocytopenia. The experiments are conducted on the data of 377 patient instances labeled by three different human experts. The methods are compared using the area under the receiver operating characteristic curve (AUC) score.
Our AUC results show that the new approach is capable of learning classification models more efficiently compared to traditional learning methods. The improvement in AUC is most remarkable when the number of examples we learn from is small.
A new classification learning framework that lets us learn from auxiliary soft-label information provided by a human expert is a promising new direction for learning classification models from expert labels, reducing the time and cost needed to label data.
医学领域的分类模型学习通常依赖于人类专家标记的数据。由于临床数据的标记可能很耗时,因此寻找减轻标记成本的方法对于我们自动学习此类模型的能力至关重要。在本文中,我们提出了一种新的机器学习方法,该方法能够通过在训练阶段使用软标签来改进二进制分类模型,这些软标签反映了人类专家对原始类标签的强烈感受,从而更有效地学习二进制分类模型。
提出了两种可从软标签中学习改进的二进制分类模型的方法。第一种方法依赖于概率/数值标签,另一种方法依赖于有序分类标签。我们研究并展示了这些方法在学习肝素诱导的血小板减少症警报模型中的优势。该实验在由三位不同的人类专家标记的 377 个患者实例的数据上进行。使用接收者操作特征曲线(AUC)得分来比较这些方法。
我们的 AUC 结果表明,与传统学习方法相比,新方法能够更有效地学习分类模型。当我们要学习的示例数量较少时,AUC 的提高最为显著。
一种新的分类学习框架,允许我们从人类专家提供的辅助软标签信息中学习,这是从专家标签学习分类模型的一个很有前途的新方向,可以减少标记数据所需的时间和成本。