Mayo Clinic, Rochester, MN.
Northwestern University, Chicago, IL.
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:624-633. eCollection 2021.
Lack of standardized representation of natural language processing (NLP) components in phenotyping algorithms hinders portability of the phenotyping algorithms and their execution in a high-throughput and reproducible manner. The objective of the study is to develop and evaluate a standard-driven approach - CQL4NLP - that integrates a collection of NLP extensions represented in the HL7 Fast Healthcare Interoperability Resources (FHIR) standard into the clinical quality language (CQL). A minimal NLP data model with 11 NLP-specific data elements was created, including six FHIR NLP extensions. All 11 data elements were identified from their usage in real-world phenotyping algorithms. An NLP ruleset generation mechanism was integrated into the NLP2FHIR pipeline and the NLP rulesets enabled comparable performance for a case study with the identification of obesity comorbidities. The NLP ruleset generation mechanism created a reproducible process for defining the NLP components of a phenotyping algorithm and its execution.
自然语言处理 (NLP) 组件在表型算法中的表示缺乏标准化,这阻碍了表型算法的可移植性及其以高通量和可重复的方式执行。本研究的目的是开发和评估一种基于标准的方法 - CQL4NLP - 将一组以 HL7 Fast Healthcare Interoperability Resources (FHIR) 标准表示的 NLP 扩展集成到临床质量语言 (CQL) 中。创建了一个具有 11 个特定于 NLP 的数据元素的最小 NLP 数据模型,包括六个 FHIR NLP 扩展。所有 11 个数据元素都是根据它们在实际表型算法中的使用情况确定的。将 NLP 规则集生成机制集成到 NLP2FHIR 管道中,并且 NLP 规则集能够通过识别肥胖合并症来实现案例研究的可比性能。NLP 规则集生成机制为定义表型算法及其执行的 NLP 组件创建了一个可重复的过程。