• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于成本意识的临床文本命名实体识别的主动学习。

Cost-aware active learning for named entity recognition in clinical text.

机构信息

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.

Pieces Technologies Inc, Dallas, Texas, USA.

出版信息

J Am Med Inform Assoc. 2019 Nov 1;26(11):1314-1322. doi: 10.1093/jamia/ocz102.

DOI:10.1093/jamia/ocz102
PMID:31294792
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6798575/
Abstract

OBJECTIVE

Active Learning (AL) attempts to reduce annotation cost (ie, time) by selecting the most informative examples for annotation. Most approaches tacitly (and unrealistically) assume that the cost for annotating each sample is identical. This study introduces a cost-aware AL method, which simultaneously models both the annotation cost and the informativeness of the samples and evaluates both via simulation and user studies.

MATERIALS AND METHODS

We designed a novel, cost-aware AL algorithm (Cost-CAUSE) for annotating clinical named entities; we first utilized lexical and syntactic features to estimate annotation cost, then we incorporated this cost measure into an existing AL algorithm. Using the 2010 i2b2/VA data set, we then conducted a simulation study comparing Cost-CAUSE with noncost-aware AL methods, and a user study comparing Cost-CAUSE with passive learning.

RESULTS

Our cost model fit empirical annotation data well, and Cost-CAUSE increased the simulation area under the learning curve (ALC) scores by up to 5.6% and 4.9%, compared with random sampling and alternate AL methods. Moreover, in a user annotation task, Cost-CAUSE outperformed passive learning on the ALC score and reduced annotation time by 20.5%-30.2%.

DISCUSSION

Although AL has proven effective in simulations, our user study shows that a real-world environment is far more complex. Other factors have a noticeable effect on the AL method, such as the annotation accuracy of users, the tiredness of users, and even the physical and mental condition of users.

CONCLUSION

Cost-CAUSE saves significant annotation cost compared to random sampling.

摘要

目的

主动学习(AL)试图通过选择最具信息量的示例来减少注释成本(即时间)。大多数方法都在潜意识(和不切实际)地假设标注每个样本的成本是相同的。本研究介绍了一种具有成本意识的 AL 方法,该方法同时对注释成本和样本的信息量进行建模,并通过模拟和用户研究来评估这两个方面。

材料与方法

我们设计了一种新颖的、具有成本意识的用于标注临床命名实体的 AL 算法(Cost-CAUSE);我们首先利用词汇和句法特征来估计注释成本,然后将该成本度量纳入现有的 AL 算法中。使用 2010 年的 i2b2/VA 数据集,我们进行了一项模拟研究,比较了 Cost-CAUSE 与非成本意识的 AL 方法,以及一项用户研究,比较了 Cost-CAUSE 与被动学习。

结果

我们的成本模型很好地拟合了经验注释数据,与随机抽样和替代 AL 方法相比,Cost-CAUSE 提高了模拟学习曲线(ALC)得分高达 5.6%和 4.9%。此外,在用户注释任务中,Cost-CAUSE 在 ALC 得分上优于被动学习,并将注释时间减少了 20.5%-30.2%。

讨论

尽管 AL 在模拟中已被证明是有效的,但我们的用户研究表明,现实环境要复杂得多。其他因素对 AL 方法有明显的影响,例如用户的注释准确性、用户的疲劳程度,甚至用户的身心状况。

结论

与随机抽样相比,Cost-CAUSE 可显著节省注释成本。

相似文献

1
Cost-aware active learning for named entity recognition in clinical text.基于成本意识的临床文本命名实体识别的主动学习。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1314-1322. doi: 10.1093/jamia/ocz102.
2
A study of active learning methods for named entity recognition in clinical text.临床文本中命名实体识别的主动学习方法研究
J Biomed Inform. 2015 Dec;58:11-18. doi: 10.1016/j.jbi.2015.09.010. Epub 2015 Sep 15.
3
An active learning-enabled annotation system for clinical named entity recognition.基于主动学习的临床命名实体识别标注系统。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):82. doi: 10.1186/s12911-017-0466-9.
4
Active learning reduces annotation time for clinical concept extraction.主动学习减少了临床概念提取的标注时间。
Int J Med Inform. 2017 Oct;106:25-31. doi: 10.1016/j.ijmedinf.2017.08.001. Epub 2017 Aug 5.
5
Utilizing active learning strategies in machine-assisted annotation for clinical named entity recognition: a comprehensive analysis considering annotation costs and target effectiveness.利用主动学习策略在机器辅助标注中进行临床命名实体识别:考虑标注成本和目标效果的综合分析。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2632-2640. doi: 10.1093/jamia/ocae197.
6
Clinical text annotation - what factors are associated with the cost of time?临床文本注释——与时间成本相关的因素有哪些?
AMIA Annu Symp Proc. 2018 Dec 5;2018:1552-1560. eCollection 2018.
7
Cost-sensitive Active Learning for Phenotyping of Electronic Health Records.用于电子健康记录表型分析的成本敏感主动学习
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:829-838. eCollection 2019.
8
Accelerating the annotation of sparse named entities by dynamic sentence selection.通过动态句子选择加速稀疏命名实体的标注
BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-9-S11-S8.
9
Applying active learning to assertion classification of concepts in clinical text.将主动学习应用于临床文本中概念的断言分类。
J Biomed Inform. 2012 Apr;45(2):265-72. doi: 10.1016/j.jbi.2011.11.003. Epub 2011 Nov 22.
10
Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.基于人在回路深度学习的电子病历自由文本数据去识别化的网络应用程序:开发与可用性研究
Interact J Med Res. 2023 Aug 25;12:e46322. doi: 10.2196/46322.

引用本文的文献

1
Active Learning Pipeline to Identify Candidate Terms for a CDSS Ontology.主动学习管道,用于识别 CDSS 本体候选术语。
Stud Health Technol Inform. 2024 Aug 22;316:1338-1342. doi: 10.3233/SHTI240660.
2
Utilizing active learning strategies in machine-assisted annotation for clinical named entity recognition: a comprehensive analysis considering annotation costs and target effectiveness.利用主动学习策略在机器辅助标注中进行临床命名实体识别:考虑标注成本和目标效果的综合分析。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2632-2640. doi: 10.1093/jamia/ocae197.
3
Social and Behavioral Determinants of Health in the Era of Artificial Intelligence with Electronic Health Records: A Scoping Review.人工智能与电子健康记录时代健康的社会和行为决定因素:一项范围综述
Health Data Sci. 2021 Aug 24;2021:9759016. doi: 10.34133/2021/9759016. eCollection 2021.
4
A Systematic Approach to Configuring MetaMap for Optimal Performance.系统方法配置 MetaMap 以实现最佳性能。
Methods Inf Med. 2022 Dec;61(S 02):e51-e63. doi: 10.1055/a-1862-0421. Epub 2022 May 25.

本文引用的文献

1
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.
2
Active learning reduces annotation time for clinical concept extraction.主动学习减少了临床概念提取的标注时间。
Int J Med Inform. 2017 Oct;106:25-31. doi: 10.1016/j.ijmedinf.2017.08.001. Epub 2017 Aug 5.
3
An active learning-enabled annotation system for clinical named entity recognition.基于主动学习的临床命名实体识别标注系统。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):82. doi: 10.1186/s12911-017-0466-9.
4
What do we mean by prediction in language comprehension?我们所说的语言理解中的预测是什么意思?
Lang Cogn Neurosci. 2016;31(1):32-59. doi: 10.1080/23273798.2015.1102299. Epub 2015 Nov 13.
5
A study of active learning methods for named entity recognition in clinical text.临床文本中命名实体识别的主动学习方法研究
J Biomed Inform. 2015 Dec;58:11-18. doi: 10.1016/j.jbi.2015.09.010. Epub 2015 Sep 15.
6
Active learning: a step towards automating medical concept extraction.主动学习:迈向医学概念提取自动化的一步。
J Am Med Inform Assoc. 2016 Mar;23(2):289-96. doi: 10.1093/jamia/ocv069. Epub 2015 Aug 7.
7
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.