Suppr超能文献

迈向用于临床自然语言处理的语义词典。

Towards a semantic lexicon for clinical natural language processing.

作者信息

Liu Hongfang, Wu Stephen T, Li Dingcheng, Jonnalagadda Siddhartha, Sohn Sunghwan, Wagholikar Kavishwar, Haug Peter J, Huff Stanley M, Chute Christopher G

机构信息

Mayo Clinic College of Medicine, Rochester, MN, USA.

出版信息

AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.

Abstract

A semantic lexicon which associates words and phrases in text to concepts is critical for extracting and encoding clinical information in free text and therefore achieving semantic interoperability between structured and unstructured data in Electronic Health Records (EHRs). Directly using existing standard terminologies may have limited coverage with respect to concepts and their corresponding mentions in text. In this paper, we analyze how tokens and phrases in a large corpus distribute and how well the UMLS captures the semantics. A corpus-driven semantic lexicon, MedLex, has been constructed where the semantics is based on the UMLS assisted with variants mined and usage information gathered from clinical text. The detailed corpus analysis of tokens, chunks, and concept mentions shows the UMLS is an invaluable source for natural language processing. Increasing the semantic coverage of tokens provides a good foundation in capturing clinical information comprehensively. The study also yields some insights in developing practical NLP systems.

摘要

一个将文本中的单词和短语与概念相关联的语义词典对于从自由文本中提取和编码临床信息至关重要,因此对于实现电子健康记录(EHR)中结构化和非结构化数据之间的语义互操作性也至关重要。直接使用现有的标准术语在概念及其在文本中的相应提及方面可能覆盖有限。在本文中,我们分析了大型语料库中的词元和短语是如何分布的,以及统一医学语言系统(UMLS)对语义的捕捉程度如何。我们构建了一个语料库驱动的语义词典MedLex,其语义基于UMLS,并辅助从临床文本中挖掘的变体和收集的使用信息。对词元、语块和概念提及的详细语料库分析表明,UMLS是自然语言处理的宝贵资源。增加词元的语义覆盖范围为全面捕捉临床信息提供了良好的基础。该研究还为开发实用的自然语言处理系统提供了一些见解。

相似文献

1
Towards a semantic lexicon for clinical natural language processing.
AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.
2
A semantic lexicon for medical language processing.
J Am Med Inform Assoc. 1999 May-Jun;6(3):205-18. doi: 10.1136/jamia.1999.0060205.
4
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.
J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.
5
7
MedLexSp - a medical lexicon for Spanish medical natural language processing.
J Biomed Semantics. 2023 Feb 2;14(1):2. doi: 10.1186/s13326-022-00281-5.
8
Towards comprehensive syntactic and semantic annotations of the clinical narrative.
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
9
Assessment of disease named entity recognition on a corpus of annotated sentences.
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
10
Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.
PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.

引用本文的文献

2
Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis.
Health Data Sci. 2021 May 18;2021:1504854. doi: 10.34133/2021/1504854. eCollection 2021.
6
An overview of biomedical entity linking throughout the years.
J Biomed Inform. 2023 Jan;137:104252. doi: 10.1016/j.jbi.2022.104252. Epub 2022 Dec 2.
8
Evaluation of a Concept Mapping Task Using Named Entity Recognition and Normalization in Unstructured Clinical Text.
J Healthc Inform Res. 2020 Oct 16;4(4):395-410. doi: 10.1007/s41666-020-00079-z. eCollection 2020 Dec.
9
10
The Sublanguage of Clinical Problem Lists: A Corpus Analysis.
AMIA Annu Symp Proc. 2018 Dec 5;2018:1451-1460. eCollection 2018.

本文引用的文献

1
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.
J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.
3
Using machine learning for concept extraction on clinical documents from multiple data sources.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):580-7. doi: 10.1136/amiajnl-2011-000155. Epub 2011 Jun 27.
4
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
5
Rewriting and suppressing UMLS terms for improved biomedical term identification.
J Biomed Semantics. 2010 Mar 31;1(1):5. doi: 10.1186/2041-1480-1-5.
6
An overview of MetaMap: historical perspective and recent advances.
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.
7
The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data.
J Am Med Inform Assoc. 2010 Mar-Apr;17(2):131-5. doi: 10.1136/jamia.2009.002691.
8
BioTagger-GM: a gene/protein name recognition system.
J Am Med Inform Assoc. 2009 Mar-Apr;16(2):247-55. doi: 10.1197/jamia.M2844. Epub 2008 Dec 11.
9
Overview of BioCreative II gene mention recognition.
Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验