将主动学习应用于临床文本中概念的断言分类。

Applying active learning to assertion classification of concepts in clinical text.

机构信息

Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA.

出版信息

J Biomed Inform. 2012 Apr;45(2):265-72. doi: 10.1016/j.jbi.2011.11.003. Epub 2011 Nov 22.

DOI:10.1016/j.jbi.2011.11.003

PMID:22127105

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3306548/

Abstract

Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC-0.7715) than the passive learning method (random sampling) (ALC-0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort.

摘要

监督机器学习方法在临床自然语言处理 (NLP) 研究中需要大量的标注样本，但由于涉及医生，这些样本的构建成本非常高。主动学习是一种从大量样本中主动采样的方法，提供了一种替代解决方案。它在分类中的主要目标是在保持预测模型质量的同时，减少标注工作。然而，很少有研究调查其在临床 NLP 中的应用。本文报告了主动学习在临床文本分类任务中的应用：确定临床概念的断言状态。该研究使用了 2010 年 i2b2/VA 临床 NLP 挑战赛中的断言分类任务的标注语料库。我们实现了几种现有的和新开发的主动学习算法，并评估了它们的使用效果。结果以基于 AUC（曲线下面积）分数的平均学习曲线的平均 AUC（曲线下面积）分数的全局 ALC 得分报告。结果表明，当使用相同数量的标注样本时，主动学习策略可以生成比被动学习方法（随机采样）更好的分类模型（最佳 ALC-0.7715）（ALC-0.7411）。此外，为了达到相同的分类性能，主动学习策略所需的样本数量少于随机采样方法。例如，要达到 AUC 为 0.79，随机采样方法需要 32 个样本，而我们最好的主动学习算法仅需要 12 个样本，手动标注工作减少了 62.5%。

相似文献

Applying active learning to assertion classification of concepts in clinical text.

J Biomed Inform. 2012 Apr;45(2):265-72. doi: 10.1016/j.jbi.2011.11.003. Epub 2011 Nov 22.

A study of active learning methods for named entity recognition in clinical text.

J Biomed Inform. 2015 Dec;58:11-18. doi: 10.1016/j.jbi.2015.09.010. Epub 2015 Sep 15.

Enhancing clinical concept extraction with distributional semantics.

J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

Active learning reduces annotation time for clinical concept extraction.

Int J Med Inform. 2017 Oct;106:25-31. doi: 10.1016/j.ijmedinf.2017.08.001. Epub 2017 Aug 5.

Active learning: a step towards automating medical concept extraction.

J Am Med Inform Assoc. 2016 Mar;23(2):289-96. doi: 10.1093/jamia/ocv069. Epub 2015 Aug 7.

MITRE system for clinical assertion status classification.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):563-7. doi: 10.1136/amiajnl-2011-000164. Epub 2011 Apr 22.

Applying active learning to supervised word sense disambiguation in MEDLINE.

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):1001-6. doi: 10.1136/amiajnl-2012-001244. Epub 2013 Jan 30.

Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):588-93. doi: 10.1136/amiajnl-2011-000154. Epub 2011 May 19.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

引用本文的文献

Scalable information extraction from free text electronic health records using large language models.

BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses.

Syst Rev. 2024 Mar 1;13(1):81. doi: 10.1186/s13643-024-02502-7.

Active learning-based systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders.

Front Res Metr Anal. 2023 May 16;8:1178181. doi: 10.3389/frma.2023.1178181. eCollection 2023.

Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study.

J Med Internet Res. 2022 Jan 18;24(1):e27434. doi: 10.2196/27434.

Deep active learning for classifying cancer pathology reports.

BMC Bioinformatics. 2021 Mar 9;22(1):113. doi: 10.1186/s12859-021-04047-1.

Evaluating active learning methods for annotating semantic predications.

JAMIA Open. 2018 Oct;1(2):275-282. doi: 10.1093/jamiaopen/ooy021. Epub 2018 Jun 27.

A Semi-Automatic Annotation Approach for Human Activity Recognition.

Sensors (Basel). 2019 Jan 25;19(3):501. doi: 10.3390/s19030501.

An active learning-enabled annotation system for clinical named entity recognition.

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):82. doi: 10.1186/s12911-017-0466-9.

Using multiclass classification to automate the identification of patient safety incident reports by type and severity.

BMC Med Inform Decis Mak. 2017 Jun 12;17(1):84. doi: 10.1186/s12911-017-0483-8.

Large-Scale Discovery of Disease-Disease and Disease-Gene Associations.

Sci Rep. 2016 Aug 31;6:32404. doi: 10.1038/srep32404.

本文引用的文献

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

Community annotation experiment for ground truth generation for the i2b2 medication challenge.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):519-23. doi: 10.1136/jamia.2010.004200.

Extracting information from textual documents in the electronic health record: a review of recent research.

Yearb Med Inform. 2008:128-44.

Active learning with support vector machine applied to gene expression data for cancer classification.

J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):1936-41. doi: 10.1021/ci049810a.

A general natural-language text processor for clinical radiology.

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74. doi: 10.1136/jamia.1994.95236146.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将主动学习应用于临床文本中概念的断言分类。

Applying active learning to assertion classification of concepts in clinical text.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献