文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

通过领域自适应提高临床叙述自然语言处理词性标注的性能。

Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.

机构信息

Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA.

出版信息

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):931-9. doi: 10.1136/amiajnl-2012-001453. Epub 2013 Mar 13.


DOI:10.1136/amiajnl-2012-001453
PMID:23486109
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3756264/
Abstract

OBJECTIVE: Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech (POS) tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives. METHODS: Two target corpora from independent healthcare institutions were constructed from high frequency clinical narratives. Four leading POS taggers with their out-of-the-box models trained from general English and biomedical abstracts were evaluated against these clinical corpora. A high performing domain adaptation method, Easy Adapt, was compared to our newly proposed method ClinAdapt. RESULTS: The evaluated POS taggers drop in accuracy by 8.5-15% when tested on clinical narratives. The highest performing tagger reports an accuracy of 88.6%. Domain adaptation with Easy Adapt reports accuracies of 88.3-91.0% on clinical texts. ClinAdapt reports 93.2-93.9%. CONCLUSIONS: ClinAdapt successfully boosts POS tagging performance through domain adaptation requiring a modest amount of annotated clinical data. Improving the performance of critical NLP subtasks is expected to reduce pipeline error propagation leading to better overall results on complex processing tasks.

摘要

目的:自然语言处理(NLP)任务通常分解为子任务,通过链连接形成处理管道。这些子任务中产生的残差传播,对最终目标产生不利影响。在临床领域,由于临床数据的标注可用性有限,基于统计的 NLP 工具仍然难以达到最新的操作特性。在这里,我们探索了临床文本的独特语言结构,并展示了当在临床领域应用现成的词性(POS)标记工具时,操作特性的损失。我们测试了一种域自适应方法,该方法将基于转换的学习者中使用的新词汇生成概率规则集成到 POS 性能提升中。 方法:从两个独立医疗机构构建了两个高频临床叙事的目标语料库。从通用英语和生物医学文摘中训练的四个领先的 POS 标记器及其默认模型,在这些临床语料库上进行了评估。与我们新提出的 ClinAdapt 方法相比,比较了高性能的域自适应方法 EasyAdapt。 结果:评估的 POS 标记器在测试临床叙事时的准确性下降了 8.5-15%。性能最高的标记器报告的准确率为 88.6%。通过 EasyAdapt 进行域自适应的准确率为 88.3-91.0%。ClinAdapt 报告的准确率为 93.2-93.9%。 结论:通过需要少量标注临床数据的域自适应,ClinAdapt 成功提高了 POS 标记性能。提高关键 NLP 子任务的性能有望减少管道错误传播,从而在复杂处理任务中获得更好的整体结果。

相似文献

[1]
Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.

J Am Med Inform Assoc. 2013-3-13

[2]
Part-of-speech tagging for clinical text: wall or bridge between institutions?

AMIA Annu Symp Proc. 2011

[3]
A token centric part-of-speech tagger for biomedical text.

Artif Intell Med. 2014-5

[4]
Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger.

J Am Med Inform Assoc. 2007

[5]
Domain-specific language models and lexicons for tagging.

J Biomed Inform. 2005-12

[6]
A Part-Of-Speech term weighting scheme for biomedical information retrieval.

J Biomed Inform. 2016-10

[7]
Developing a corpus of clinical notes manually annotated for part-of-speech.

Int J Med Inform. 2006-6

[8]
A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text.

BMC Med Inform Decis Mak. 2019-4-9

[9]
Performance analysis of a POS tagger applied to discharge summaries in Portuguese.

Stud Health Technol Inform. 2010

[10]
A universal multilingual weightless neural network tagger via quantitative linguistics.

Neural Netw. 2017-7

引用本文的文献

[1]
Natural Language Processing in Nephrology.

Adv Chronic Kidney Dis. 2022-9

[2]
Evolving Role and Future Directions of Natural Language Processing in Gastroenterology.

Dig Dis Sci. 2021-1

[3]
CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital.

BMC Med Inform Decis Mak. 2018-6-25

[4]
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach.

JMIR Med Inform. 2017-10-31

[5]
The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance.

Appl Clin Inform. 2017-5-31

[6]
Creation of a new longitudinal corpus of clinical narratives.

J Biomed Inform. 2015-12

[7]
Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.

Yearb Med Inform. 2015-8-13

[8]
Domain adaptation for semantic role labeling of clinical text.

J Am Med Inform Assoc. 2015-9

[9]
Use of adjectives in abstracts when reporting results of randomized, controlled trials from industry and academia.

Drugs R D. 2015-3

[10]
Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.

Database (Oxford). 2014-6-16

本文引用的文献

[1]
Part-of-speech tagging for clinical text: wall or bridge between institutions?

AMIA Annu Symp Proc. 2011

[2]
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.

J Am Med Inform Assoc. 2011

[3]
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

J Am Med Inform Assoc. 2010

[4]
Extracting information from textual documents in the electronic health record: a review of recent research.

Yearb Med Inform. 2008

[5]
Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger.

J Am Med Inform Assoc. 2007

[6]
Domain-specific language models and lexicons for tagging.

J Biomed Inform. 2005-12

[7]
Developing a corpus of clinical notes manually annotated for part-of-speech.

Int J Med Inform. 2006-6

[8]
MedPost: a part-of-speech tagger for bioMedical text.

Bioinformatics. 2004-9-22

[9]
GENIA corpus--semantically annotated corpus for bio-textmining.

Bioinformatics. 2003

[10]
Two biomedical sublanguages: a description based on the theories of Zellig Harris.

J Biomed Inform. 2002-8

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索