Suppr超能文献

基于主动学习的临床命名实体识别标注系统。

An active learning-enabled annotation system for clinical named entity recognition.

机构信息

Pieces Technologies Inc, Dallas, TX, USA.

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

出版信息

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):82. doi: 10.1186/s12911-017-0466-9.

Abstract

BACKGROUND

Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain.

METHODS

In this study, we developed the first AL-enabled annotation system for clinical named entity recognition (NER) with a novel AL algorithm. Besides the simulation study to evaluate the novel AL algorithm, we further conducted user studies with two nurses using this system to assess the performance of AL in real world annotation processes for building clinical NER models.

RESULTS

The simulation results show that the novel AL algorithm outperformed traditional AL algorithm and random sampling. However, the user study tells a different story that AL methods did not always perform better than random sampling for different users.

CONCLUSIONS

We found that the increased information content of actively selected sentences is strongly offset by the increased time required to annotate them. Moreover, the annotation time was not considered in the querying algorithms. Our future work includes developing better AL algorithms with the estimation of annotation time and evaluating the system with larger number of users.

摘要

背景

主动学习(AL)已显示出在构建统计自然语言处理(NLP)模型时具有减少注释成本和最大化性能的巨大潜力。然而,很少有研究在医学领域的实际环境中研究 AL。

方法

在这项研究中,我们开发了第一个具有新颖 AL 算法的用于临床命名实体识别(NER)的 AL 启用注释系统。除了对新型 AL 算法进行模拟研究以评估其性能外,我们还进一步让两名护士使用该系统进行用户研究,以评估 AL 在真实世界的注释过程中构建临床 NER 模型的性能。

结果

模拟结果表明,新型 AL 算法优于传统的 AL 算法和随机抽样。然而,用户研究告诉我们一个不同的故事,即对于不同的用户,AL 方法并不总是比随机抽样表现更好。

结论

我们发现,主动选择的句子的信息量增加被注释所需的时间增加所抵消。此外,查询算法中没有考虑注释时间。我们未来的工作包括开发更好的 AL 算法,同时考虑注释时间,并使用更多的用户来评估该系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bfa/5506567/e5afff814ec5/12911_2017_466_Fig1_HTML.jpg

相似文献

6
Active learning reduces annotation time for clinical concept extraction.主动学习减少了临床概念提取的标注时间。
Int J Med Inform. 2017 Oct;106:25-31. doi: 10.1016/j.ijmedinf.2017.08.001. Epub 2017 Aug 5.
8
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

引用本文的文献

5
A Systematic Approach to Configuring MetaMap for Optimal Performance.系统方法配置 MetaMap 以实现最佳性能。
Methods Inf Med. 2022 Dec;61(S 02):e51-e63. doi: 10.1055/a-1862-0421. Epub 2022 May 25.
7
Clinical concept extraction: A methodology review.临床概念提取:方法学综述。
J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.
10

本文引用的文献

3
Applying active learning to supervised word sense disambiguation in MEDLINE.将主动学习应用于 MEDLINE 中的监督词义消歧。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):1001-6. doi: 10.1136/amiajnl-2012-001244. Epub 2013 Jan 30.
6
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
8
Extracting medication information from clinical text.从临床文本中提取药物信息。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.
9
Clustering by passing messages between data points.通过在数据点之间传递信息进行聚类。
Science. 2007 Feb 16;315(5814):972-6. doi: 10.1126/science.1136800. Epub 2007 Jan 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验