Lu Chris J, Payne Amanda, Mork James G
Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
J Am Med Inform Assoc. 2020 May 29;27(10):1600-5. doi: 10.1093/jamia/ocaa056.
Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.
自然语言处理(NLP)在现代医学信息学中发挥着至关重要的作用。它通过分析和提取概念,将叙述性文本或非结构化数据转化为知识。一个全面的词汇系统是NLP应用成功的基础,也是NLP流程开始时的一个重要组成部分。由美国国立医学图书馆作为统一医学语言系统知识源之一分发的专业词汇表和词汇工具,为许多NLP应用提供了基础资源。本文报告了该词汇表中3个关键组件的最新进展。统一医学语言系统概念映射的核心NLP操作被用来阐明这些进展的重要性。我们的目标是为NLP应用提供通用、广泛覆盖且强大的词汇系统。本文提出了一种新颖的多词方法以及其他计划中的进展。