Wang Xin, Bi Jinbo
IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):564-575. doi: 10.1109/TCBB.2016.2576457. Epub 2016 Jun 7.
The problem of constructing classifiers from multiple annotators who provide inconsistent training labels is important and occurs in many application domains. Many existing methods focus on the understanding and learning of the crowd behaviors. Several probabilistic algorithms consider the construction of classifiers for specific tasks using consensus of multiple labelers annotations. These methods impose a prior on the consensus and develop an expectation-maximization algorithm based on logistic regression loss. We extend the discussion to the hinge loss commonly used by support vector machines. Our formulations form bi-convex programs that construct classifiers and estimate the reliability of each labeler simultaneously. Each labeler is associated with a reliability parameter, which can be a constant, or class-dependent, or varies for different examples. The hinge loss is modified by replacing the true labels by the weighted combination of labelers' labels with reliabilities as weights. Statistical justification is discussed to motivate the use of linear combination of labels. In parallel to the expectation-maximization algorithm for logistic-based methods, efficient alternating algorithms are developed to solve the proposed bi-convex programs. Experimental results on benchmark datasets and three real-world biomedical problems demonstrate that the proposed methods either outperform or are competitive to the state of the art.
从提供不一致训练标签的多个注释者构建分类器的问题很重要,并且在许多应用领域中都会出现。许多现有方法专注于对群体行为的理解和学习。一些概率算法考虑使用多个标注者注释的共识来构建特定任务的分类器。这些方法对共识施加先验,并基于逻辑回归损失开发期望最大化算法。我们将讨论扩展到支持向量机常用的铰链损失。我们的公式形成了双凸规划,可同时构建分类器并估计每个标注者的可靠性。每个标注者都与一个可靠性参数相关联,该参数可以是常数、与类别相关或因不同示例而异。通过将真实标签替换为以可靠性为权重的标注者标签的加权组合来修改铰链损失。讨论了统计依据以推动标签线性组合的使用。与基于逻辑的方法的期望最大化算法并行,开发了高效的交替算法来解决所提出的双凸规划。在基准数据集和三个实际生物医学问题上的实验结果表明,所提出的方法要么优于现有技术,要么与之具有竞争力。