Suppr超能文献

采用主动学习和被动学习方法的条件严重程度分类模型的标签间和标签内变异性。

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

机构信息

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

出版信息

Artif Intell Med. 2017 Sep;81:12-32. doi: 10.1016/j.artmed.2017.03.003. Epub 2017 Apr 27.

Abstract

BACKGROUND AND OBJECTIVES

Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers.

METHODS

We used our CAESAR-ALE framework for classifying the severity of clinical conditions, the three AL methods and the passive learning method, as mentioned above, to induce the classifications models. We used a dataset of 516 clinical conditions and their severity labeling, represented by features aggregated from the medical records of 1.9 million patients treated at Columbia University Medical Center. We analyzed the variance of the classification performance within (intra-labeler), and especially among (inter-labeler) the classification models that were induced by using the labels provided by seven labelers. We also compared the performance of the passive and active learning models when using the consensus label.

RESULTS

The AL methods: produced, for the models induced from each labeler, smoother Intra-labeler learning curves during the training phase, compared to the models produced when using the passive learning method. The mean standard deviation of the learning curves of the three AL methods over all labelers (mean: 0.0379; range: [0.0182 to 0.0496]), was significantly lower (p=0.049) than the Intra-labeler standard deviation when using the passive learning method (mean: 0.0484; range: [0.0275-0.0724). Using the AL methods resulted in a lower mean Inter-labeler AUC standard deviation among the AUC values of the labelers' different models during the training phase, compared to the variance of the induced models' AUC values when using passive learning. The Inter-labeler AUC standard deviation, using the passive learning method (0.039), was almost twice as high as the Inter-labeler standard deviation using our two new AL methods (0.02 and 0.019, respectively). The SVM-Margin AL method resulted in an Inter-labeler standard deviation (0.029) that was higher by almost 50% than that of our two AL methods The difference in the inter-labeler standard deviation between the passive learning method and the SVM-Margin learning method was significant (p=0.042). The difference between the SVM-Margin and Exploitation method was insignificant (p=0.29), as was the difference between the Combination_XA and Exploitation methods (p=0.67). Finally, using the consensus label led to a learning curve that had a higher mean intra-labeler variance, but resulted eventually in an AUC that was at least as high as the AUC achieved using the gold standard label and that was always higher than the expected mean AUC of a randomly selected labeler, regardless of the choice of learning method (including a passive learning method). Using a paired t-test, the difference between the intra-labeler AUC standard deviation when using the consensus label, versus that value when using the other two labeling strategies, was significant only when using the passive learning method (p=0.014), but not when using any of the three AL methods.

CONCLUSIONS

The use of AL methods, (a) reduces intra-labeler variability in the performance of the induced models during the training phase, and thus reduces the risk of halting the process at a local minimum that is significantly different in performance from the rest of the learned models; and (b) reduces Inter-labeler performance variance, and thus reduces the dependence on the use of a particular labeler. In addition, the use of a consensus label, agreed upon by a rather uneven group of labelers, might be at least as good as using the gold standard labeler, who might not be available, and certainly better than randomly selecting one of the group's individual labelers. Finally, using the AL methods: when provided by the consensus label reduced the intra-labeler AUC variance during the learning phase, compared to using passive learning.

摘要

背景和目的

通过领域专家对分类进行实例标注通常既耗时又昂贵。为了减少此类标注工作,我们提出了应用主动学习(AL)方法的建议,引入了我们的 CAESAR-ALE 框架来对临床情况的严重程度进行分类,并展示了其显著减少标注工作的效果。与标准的被动(随机实例选择)SVM 学习相比,使用三种 AL 方法中的任意一种(一种众所周知的[SVM-Margin],以及我们引入的两种[利用和组合_XA])都显著减少了(48%至 64%)条件标注工作。此外,我们的新 AL 方法使用比 SVM-Margin AL 方法少 12%的标记案例实现了最大准确性。然而,由于标注者具有不同的专业水平,这是与学习方法,特别是 AL 方法相关的一个主要问题,即如何最好地利用委员会标注者提供的标注。首先,我们想知道,根据标注者的学习曲线,是否使用 AL 方法(与标准的被动学习方法相比)对内在标注者的变异性(每个标注者学习曲线中的差异)和标注者之间的变异性(不同标注者的学习曲线之间的差异)产生影响。然后,我们想研究从一组标注者的多数共识中学习的效果。

方法

我们使用 CAESAR-ALE 框架来对临床情况的严重程度进行分类,使用上述三种 AL 方法和被动学习方法来诱导分类模型。我们使用了一个包含 516 种临床情况及其严重程度标注的数据集,这些标注由哥伦比亚大学医疗中心 190 万患者的病历汇总的特征表示。我们分析了由七个标注者提供的标签诱导的分类模型的内部(标注者内部)和特别是之间(标注者之间)的分类性能方差。我们还比较了在使用共识标签时被动和主动学习模型的性能。

结果

AL 方法:与使用被动学习方法相比,在训练阶段,为每个标注者诱导的模型产生了更平滑的标注者内部学习曲线。三种 AL 方法的平均标准偏差(平均值:0.0379;范围:[0.0182 至 0.0496])明显低于使用被动学习方法时的标注者内部标准偏差(平均值:0.0484;范围:[0.0275-0.0724])。使用 AL 方法可降低训练阶段标注者不同模型的 AUC 值的标注者之间 AUC 标准偏差,与使用被动学习方法时诱导模型的 AUC 值的方差相比。使用被动学习方法时的标注者之间 AUC 标准偏差(0.039)几乎是我们两种新的 AL 方法(分别为 0.02 和 0.019)的标注者之间 AUC 标准偏差的两倍。SVM-Margin AL 方法的标注者之间标准偏差(0.029)比我们的两种 AL 方法高近 50%。被动学习方法和 SVM-Margin 学习方法之间的标注者之间标准偏差差异显著(p=0.042)。SVM-Margin 和利用方法之间的差异不显著(p=0.29),组合_XA 和利用方法之间的差异也不显著(p=0.67)。最后,使用共识标签会导致一个具有更高平均标注者内部方差的学习曲线,但最终的 AUC 至少与使用黄金标准标签获得的 AUC 一样高,并且始终高于随机选择的标注者的预期平均 AUC,无论选择哪种学习方法(包括被动学习方法)。使用配对 t 检验,当使用共识标签时的标注者内部 AUC 标准偏差与使用其他两种标注策略时的差异仅在使用被动学习方法时显著(p=0.014),而在使用任何三种 AL 方法时都不显著。

结论

使用 AL 方法:(a)减少了训练阶段诱导模型性能的标注者内部变异性,从而降低了在性能上显著不同于其余学习模型的局部最小值处停止过程的风险;(b)减少了标注者之间的性能变异性,从而减少了对特定标注者使用的依赖。此外,使用共识标签,由一组不均匀的标注者达成的共识,可能至少与使用黄金标准标注者一样好,而黄金标准标注者可能无法使用,而且肯定比随机选择组内的个别标注者要好。最后,当使用共识标签时,使用 AL 方法:在学习阶段减少了标注者内部 AUC 方差,与使用被动学习相比。

相似文献

6
Learning classification models with soft-label information.学习带有软标签信息的分类模型。
J Am Med Inform Assoc. 2014 May-Jun;21(3):501-8. doi: 10.1136/amiajnl-2013-001964. Epub 2013 Nov 20.
8
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

本文引用的文献

1
Prognosis of Clinical Outcomes with Temporal Patterns and Experiences with One Class Feature Selection.具有时间模式的临床结果预后及单类特征选择的经验
IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):555-563. doi: 10.1109/TCBB.2016.2591539. Epub 2016 Jul 14.
9
Diagnosis code assignment: models and evaluation metrics.诊断码分配:模型和评估指标。
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):231-7. doi: 10.1136/amiajnl-2013-002159. Epub 2013 Dec 2.
10
Temporal properties of diagnosis code time series in aggregate.总体诊断代码时间序列的时间特性。
IEEE J Biomed Health Inform. 2013 Mar;17(2):477-83. doi: 10.1109/JBHI.2013.2244610.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验