Med Data Quest, Inc, La Jolla, California, USA.
J Am Med Inform Assoc. 2019 Nov 1;26(11):1218-1226. doi: 10.1093/jamia/ocz109.
Identifying patients who meet selection criteria for clinical trials is typically challenging and time-consuming. In this article, we describe our clinical natural language processing (NLP) system to automatically assess patients' eligibility based on their longitudinal medical records. This work was part of the 2018 National NLP Clinical Challenges (n2c2) Shared-Task and Workshop on Cohort Selection for Clinical Trials.
The authors developed an integrated rule-based clinical NLP system which employs a generic rule-based framework plugged in with lexical-, syntactic- and meta-level, task-specific knowledge inputs. In addition, the authors also implemented and evaluated a general clinical NLP (cNLP) system which is built with the Unified Medical Language System and Unstructured Information Management Architecture.
The systems were evaluated as part of the 2018 n2c2-1 challenge, and authors' rule-based system obtained an F-measure of 0.9028, ranking fourth at the challenge and had less than 1% difference from the best system. While the general cNLP system didn't achieve performance as good as the rule-based system, it did establish its own advantages and potential in extracting clinical concepts.
Our results indicate that a well-designed rule-based clinical NLP system is capable of achieving good performance on cohort selection even with a small training data set. In addition, the investigation of a Unified Medical Language System-based general cNLP system suggests that a hybrid system combining these 2 approaches is promising to surpass the state-of-the-art performance.
确定符合临床试验选择标准的患者通常具有挑战性且耗时。本文描述了我们的临床自然语言处理(NLP)系统,该系统可根据患者的纵向病历自动评估其资格。这项工作是 2018 年国家 NLP 临床挑战赛(n2c2)和临床试验队列选择专题研讨会的一部分。
作者开发了一个集成的基于规则的临床 NLP 系统,该系统采用基于通用规则的框架,并使用词汇、语法和元级别的、特定于任务的知识输入。此外,作者还实施和评估了一个基于统一医学语言系统和非结构化信息管理架构的通用临床 NLP(cNLP)系统。
这些系统是作为 2018 年 n2c2-1 挑战赛的一部分进行评估的,作者的基于规则的系统的 F1 得分为 0.9028,在挑战赛中排名第四,与最佳系统的差距不到 1%。虽然通用的 cNLP 系统的性能不如基于规则的系统好,但它确实在提取临床概念方面具有自己的优势和潜力。
我们的结果表明,精心设计的基于规则的临床 NLP 系统即使使用较小的训练数据集也能够在队列选择方面实现良好的性能。此外,对基于统一医学语言系统的通用 cNLP 系统的研究表明,结合这两种方法的混合系统有可能超越现有技术水平。