迈向用于临床自然语言处理的语义词典。

Towards a semantic lexicon for clinical natural language processing.

作者信息

Liu Hongfang, Wu Stephen T, Li Dingcheng, Jonnalagadda Siddhartha, Sohn Sunghwan, Wagholikar Kavishwar, Haug Peter J, Huff Stanley M, Chute Christopher G

机构信息

Mayo Clinic College of Medicine, Rochester, MN, USA.

出版信息

AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.

PMID:23304329

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3540492/

Abstract

A semantic lexicon which associates words and phrases in text to concepts is critical for extracting and encoding clinical information in free text and therefore achieving semantic interoperability between structured and unstructured data in Electronic Health Records (EHRs). Directly using existing standard terminologies may have limited coverage with respect to concepts and their corresponding mentions in text. In this paper, we analyze how tokens and phrases in a large corpus distribute and how well the UMLS captures the semantics. A corpus-driven semantic lexicon, MedLex, has been constructed where the semantics is based on the UMLS assisted with variants mined and usage information gathered from clinical text. The detailed corpus analysis of tokens, chunks, and concept mentions shows the UMLS is an invaluable source for natural language processing. Increasing the semantic coverage of tokens provides a good foundation in capturing clinical information comprehensively. The study also yields some insights in developing practical NLP systems.

摘要

一个将文本中的单词和短语与概念相关联的语义词典对于从自由文本中提取和编码临床信息至关重要，因此对于实现电子健康记录（EHR）中结构化和非结构化数据之间的语义互操作性也至关重要。直接使用现有的标准术语在概念及其在文本中的相应提及方面可能覆盖有限。在本文中，我们分析了大型语料库中的词元和短语是如何分布的，以及统一医学语言系统（UMLS）对语义的捕捉程度如何。我们构建了一个语料库驱动的语义词典MedLex，其语义基于UMLS，并辅助从临床文本中挖掘的变体和收集的使用信息。对词元、语块和概念提及的详细语料库分析表明，UMLS是自然语言处理的宝贵资源。增加词元的语义覆盖范围为全面捕捉临床信息提供了良好的基础。该研究还为开发实用的自然语言处理系统提供了一些见解。

相似文献

Towards a semantic lexicon for clinical natural language processing.

AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.

A semantic lexicon for medical language processing.

J Am Med Inform Assoc. 1999 May-Jun;6(3):205-18. doi: 10.1136/jamia.1999.0060205.

A technique for semantic classification of unknown words using UMLS resources.

Proc AMIA Symp. 1999:716-20.

Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.

J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.

Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets.

J Am Med Inform Assoc. 2021 Mar 1;28(3):516-532. doi: 10.1093/jamia/ocaa269.

Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method.

AMIA Annu Symp Proc. 2012;2012:409-16. Epub 2012 Nov 3.

MedLexSp - a medical lexicon for Spanish medical natural language processing.

J Biomed Semantics. 2023 Feb 2;14(1):2. doi: 10.1186/s13326-022-00281-5.

Towards comprehensive syntactic and semantic annotations of the clinical narrative.

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.

Assessment of disease named entity recognition on a corpus of annotated sentences.

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.

PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.

引用本文的文献

Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.

Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.

Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis.

Health Data Sci. 2021 May 18;2021:1504854. doi: 10.34133/2021/1504854. eCollection 2021.

Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.

JMIR Med Inform. 2023 Dec 15;11:e42477. doi: 10.2196/42477.

Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records.

BMC Med Inform Decis Mak. 2023 Sep 18;23(1):188. doi: 10.1186/s12911-023-02271-8.

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis.

JMIR Med Inform. 2023 Jun 27;11:e48072. doi: 10.2196/48072.

An overview of biomedical entity linking throughout the years.

J Biomed Inform. 2023 Jan;137:104252. doi: 10.1016/j.jbi.2022.104252. Epub 2022 Dec 2.

Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation.

JMIR Med Inform. 2022 Aug 31;10(8):e38155. doi: 10.2196/38155.

Evaluation of a Concept Mapping Task Using Named Entity Recognition and Normalization in Unstructured Clinical Text.

J Healthc Inform Res. 2020 Oct 16;4(4):395-410. doi: 10.1007/s41666-020-00079-z. eCollection 2020 Dec.

Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES.

J Biomed Semantics. 2019 Sep 18;10(1):14. doi: 10.1186/s13326-019-0207-3.

The Sublanguage of Clinical Problem Lists: A Corpus Analysis.

AMIA Annu Symp Proc. 2018 Dec 5;2018:1451-1460. eCollection 2018.

本文引用的文献

Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.

J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.

Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature.

AMIA Annu Symp Proc. 2011;2011:1550-8. Epub 2011 Oct 22.

Using machine learning for concept extraction on clinical documents from multiple data sources.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):580-7. doi: 10.1136/amiajnl-2011-000155. Epub 2011 Jun 27.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

Rewriting and suppressing UMLS terms for improved biomedical term identification.

J Biomed Semantics. 2010 Mar 31;1(1):5. doi: 10.1186/2041-1480-1-5.

An overview of MetaMap: historical perspective and recent advances.

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data.

J Am Med Inform Assoc. 2010 Mar-Apr;17(2):131-5. doi: 10.1136/jamia.2009.002691.

BioTagger-GM: a gene/protein name recognition system.

J Am Med Inform Assoc. 2009 Mar-Apr;16(2):247-55. doi: 10.1197/jamia.M2844. Epub 2008 Dec 11.

Overview of BioCreative II gene mention recognition.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.

Enhancing acronym/abbreviation knowledge bases with semantic information.

AMIA Annu Symp Proc. 2007 Oct 11;2007:731-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向用于临床自然语言处理的语义词典。

Towards a semantic lexicon for clinical natural language processing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献