采用主动学习和被动学习方法的条件严重程度分类模型的标签间和标签内变异性。

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

机构信息

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

出版信息

Artif Intell Med. 2017 Sep;81:12-32. doi: 10.1016/j.artmed.2017.03.003. Epub 2017 Apr 27.

DOI:10.1016/j.artmed.2017.03.003

PMID:28456512

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5937023/

Abstract

BACKGROUND AND OBJECTIVES

Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers.

METHODS

We used our CAESAR-ALE framework for classifying the severity of clinical conditions, the three AL methods and the passive learning method, as mentioned above, to induce the classifications models. We used a dataset of 516 clinical conditions and their severity labeling, represented by features aggregated from the medical records of 1.9 million patients treated at Columbia University Medical Center. We analyzed the variance of the classification performance within (intra-labeler), and especially among (inter-labeler) the classification models that were induced by using the labels provided by seven labelers. We also compared the performance of the passive and active learning models when using the consensus label.

RESULTS

The AL methods: produced, for the models induced from each labeler, smoother Intra-labeler learning curves during the training phase, compared to the models produced when using the passive learning method. The mean standard deviation of the learning curves of the three AL methods over all labelers (mean: 0.0379; range: [0.0182 to 0.0496]), was significantly lower (p=0.049) than the Intra-labeler standard deviation when using the passive learning method (mean: 0.0484; range: [0.0275-0.0724). Using the AL methods resulted in a lower mean Inter-labeler AUC standard deviation among the AUC values of the labelers' different models during the training phase, compared to the variance of the induced models' AUC values when using passive learning. The Inter-labeler AUC standard deviation, using the passive learning method (0.039), was almost twice as high as the Inter-labeler standard deviation using our two new AL methods (0.02 and 0.019, respectively). The SVM-Margin AL method resulted in an Inter-labeler standard deviation (0.029) that was higher by almost 50% than that of our two AL methods The difference in the inter-labeler standard deviation between the passive learning method and the SVM-Margin learning method was significant (p=0.042). The difference between the SVM-Margin and Exploitation method was insignificant (p=0.29), as was the difference between the Combination_XA and Exploitation methods (p=0.67). Finally, using the consensus label led to a learning curve that had a higher mean intra-labeler variance, but resulted eventually in an AUC that was at least as high as the AUC achieved using the gold standard label and that was always higher than the expected mean AUC of a randomly selected labeler, regardless of the choice of learning method (including a passive learning method). Using a paired t-test, the difference between the intra-labeler AUC standard deviation when using the consensus label, versus that value when using the other two labeling strategies, was significant only when using the passive learning method (p=0.014), but not when using any of the three AL methods.

CONCLUSIONS

The use of AL methods, (a) reduces intra-labeler variability in the performance of the induced models during the training phase, and thus reduces the risk of halting the process at a local minimum that is significantly different in performance from the rest of the learned models; and (b) reduces Inter-labeler performance variance, and thus reduces the dependence on the use of a particular labeler. In addition, the use of a consensus label, agreed upon by a rather uneven group of labelers, might be at least as good as using the gold standard labeler, who might not be available, and certainly better than randomly selecting one of the group's individual labelers. Finally, using the AL methods: when provided by the consensus label reduced the intra-labeler AUC variance during the learning phase, compared to using passive learning.

摘要

背景和目的

通过领域专家对分类进行实例标注通常既耗时又昂贵。为了减少此类标注工作，我们提出了应用主动学习（AL）方法的建议，引入了我们的 CAESAR-ALE 框架来对临床情况的严重程度进行分类，并展示了其显著减少标注工作的效果。与标准的被动（随机实例选择）SVM 学习相比，使用三种 AL 方法中的任意一种（一种众所周知的[SVM-Margin]，以及我们引入的两种[利用和组合_XA]）都显著减少了（48%至 64%）条件标注工作。此外，我们的新 AL 方法使用比 SVM-Margin AL 方法少 12%的标记案例实现了最大准确性。然而，由于标注者具有不同的专业水平，这是与学习方法，特别是 AL 方法相关的一个主要问题，即如何最好地利用委员会标注者提供的标注。首先，我们想知道，根据标注者的学习曲线，是否使用 AL 方法（与标准的被动学习方法相比）对内在标注者的变异性（每个标注者学习曲线中的差异）和标注者之间的变异性（不同标注者的学习曲线之间的差异）产生影响。然后，我们想研究从一组标注者的多数共识中学习的效果。

方法

我们使用 CAESAR-ALE 框架来对临床情况的严重程度进行分类，使用上述三种 AL 方法和被动学习方法来诱导分类模型。我们使用了一个包含 516 种临床情况及其严重程度标注的数据集，这些标注由哥伦比亚大学医疗中心 190 万患者的病历汇总的特征表示。我们分析了由七个标注者提供的标签诱导的分类模型的内部（标注者内部）和特别是之间（标注者之间）的分类性能方差。我们还比较了在使用共识标签时被动和主动学习模型的性能。

结果

AL 方法：与使用被动学习方法相比，在训练阶段，为每个标注者诱导的模型产生了更平滑的标注者内部学习曲线。三种 AL 方法的平均标准偏差（平均值：0.0379；范围：[0.0182 至 0.0496]）明显低于使用被动学习方法时的标注者内部标准偏差（平均值：0.0484；范围：[0.0275-0.0724]）。使用 AL 方法可降低训练阶段标注者不同模型的 AUC 值的标注者之间 AUC 标准偏差，与使用被动学习方法时诱导模型的 AUC 值的方差相比。使用被动学习方法时的标注者之间 AUC 标准偏差（0.039）几乎是我们两种新的 AL 方法（分别为 0.02 和 0.019）的标注者之间 AUC 标准偏差的两倍。SVM-Margin AL 方法的标注者之间标准偏差（0.029）比我们的两种 AL 方法高近 50%。被动学习方法和 SVM-Margin 学习方法之间的标注者之间标准偏差差异显著（p=0.042）。SVM-Margin 和利用方法之间的差异不显著（p=0.29），组合_XA 和利用方法之间的差异也不显著（p=0.67）。最后，使用共识标签会导致一个具有更高平均标注者内部方差的学习曲线，但最终的 AUC 至少与使用黄金标准标签获得的 AUC 一样高，并且始终高于随机选择的标注者的预期平均 AUC，无论选择哪种学习方法（包括被动学习方法）。使用配对 t 检验，当使用共识标签时的标注者内部 AUC 标准偏差与使用其他两种标注策略时的差异仅在使用被动学习方法时显著（p=0.014），而在使用任何三种 AL 方法时都不显著。

结论

使用 AL 方法：（a）减少了训练阶段诱导模型性能的标注者内部变异性，从而降低了在性能上显著不同于其余学习模型的局部最小值处停止过程的风险；（b）减少了标注者之间的性能变异性，从而减少了对特定标注者使用的依赖。此外，使用共识标签，由一组不均匀的标注者达成的共识，可能至少与使用黄金标准标注者一样好，而黄金标准标注者可能无法使用，而且肯定比随机选择组内的个别标注者要好。最后，当使用共识标签时，使用 AL 方法：在学习阶段减少了标注者内部 AUC 方差，与使用被动学习相比。

相似文献

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.采用主动学习和被动学习方法的条件严重程度分类模型的标签间和标签内变异性。

Artif Intell Med. 2017 Sep;81:12-32. doi: 10.1016/j.artmed.2017.03.003. Epub 2017 Apr 27.

Improving condition severity classification with an efficient active learning based framework.使用基于高效主动学习的框架改进病情严重程度分类。

J Biomed Inform. 2016 Jun;61:44-54. doi: 10.1016/j.jbi.2016.03.016. Epub 2016 Mar 22.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Language model-based labeling of German thoracic radiology reports.基于语言模型的德国胸部放射学报告标注

Rofo. 2025 Jan;197(1):55-64. doi: 10.1055/a-2287-5054. Epub 2024 Apr 25.

Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.基于多元抽象和区间的动态时间规整方法的实现和评估，作为一种用于纵向医疗记录的相似性度量方法。

J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.

Learning classification models with soft-label information.学习带有软标签信息的分类模型。

J Am Med Inform Assoc. 2014 May-Jun;21(3):501-8. doi: 10.1136/amiajnl-2013-001964. Epub 2013 Nov 20.

A Transfer Learning-Based Multi-Instance Learning Method With Weak Labels.一种基于迁移学习的带有弱标签的多示例学习方法。

IEEE Trans Cybern. 2022 Jan;52(1):287-300. doi: 10.1109/TCYB.2020.2973450. Epub 2022 Jan 11.

Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.将主动学习应用于电子健康记录数据的高通量表型算法。

J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9. doi: 10.1136/amiajnl-2013-001945. Epub 2013 Jul 13.

Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis.游戏化众包作为一种新型的肺部超声数据集标注方法：前瞻性分析。

J Med Internet Res. 2024 Jul 4;26:e51397. doi: 10.2196/51397.

本文引用的文献

Prognosis of Clinical Outcomes with Temporal Patterns and Experiences with One Class Feature Selection.具有时间模式的临床结果预后及单类特征选择的经验

IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):555-563. doi: 10.1109/TCBB.2016.2591539. Epub 2016 Jul 14.

Improving condition severity classification with an efficient active learning based framework.使用基于高效主动学习的框架改进病情严重程度分类。

J Biomed Inform. 2016 Jun;61:44-54. doi: 10.1016/j.jbi.2016.03.016. Epub 2016 Mar 22.

Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms.用于识别药物不良反应并阐明其潜在生物学机制的系统生物学方法。

Wiley Interdiscip Rev Syst Biol Med. 2016 Mar-Apr;8(2):104-22. doi: 10.1002/wsbm.1323. Epub 2015 Nov 12.

Are All Vaccines Created Equal? Using Electronic Health Records to Discover Vaccines Associated With Clinician-Coded Adverse Events.所有疫苗都一样吗？利用电子健康记录发现与临床医生编码的不良事件相关的疫苗。

AMIA Jt Summits Transl Sci Proc. 2015 Mar 23;2015:196-200. eCollection 2015.

Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data.挖掘多元时间序列数据中用于事件检测的近期时间模式。

KDD. 2012;2012:280-288. doi: 10.1145/2339530.2339578.

Development and validation of a classification approach for extracting severity automatically from electronic health records.一种用于从电子健康记录中自动提取严重程度的分类方法的开发与验证。

J Biomed Semantics. 2015 Apr 6;6:14. doi: 10.1186/s13326-015-0010-8. eCollection 2015.

Supervised machine learning and active learning in classification of radiology reports.监督机器学习和主动学习在放射科报告分类中的应用。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):893-901. doi: 10.1136/amiajnl-2013-002516. Epub 2014 May 22.

PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.PARAMO：一个用于医疗保健分析研究的并行预测建模平台，使用电子健康记录。

J Biomed Inform. 2014 Apr;48:160-70. doi: 10.1016/j.jbi.2013.12.012. Epub 2013 Dec 25.

Diagnosis code assignment: models and evaluation metrics.诊断码分配：模型和评估指标。

J Am Med Inform Assoc. 2014 Mar-Apr;21(2):231-7. doi: 10.1136/amiajnl-2013-002159. Epub 2013 Dec 2.

Temporal properties of diagnosis code time series in aggregate.总体诊断代码时间序列的时间特性。

IEEE J Biomed Health Inform. 2013 Mar;17(2):477-83. doi: 10.1109/JBHI.2013.2244610.

采用主动学习和被动学习方法的条件严重程度分类模型的标签间和标签内变异性。

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

机构信息

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

出版信息

Artif Intell Med. 2017 Sep;81:12-32. doi: 10.1016/j.artmed.2017.03.003. Epub 2017 Apr 27.

DOI:10.1016/j.artmed.2017.03.003

PMID:28456512

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5937023/

Abstract

BACKGROUND AND OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

采用主动学习和被动学习方法的条件严重程度分类模型的标签间和标签内变异性。

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

机构信息

出版信息

BACKGROUND AND OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

背景和目的

方法

结果

结论

相似文献

本文引用的文献

采用主动学习和被动学习方法的条件严重程度分类模型的标签间和标签内变异性。

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

机构信息

出版信息

BACKGROUND AND OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

背景和目的

方法

结果

结论

相似文献

本文引用的文献