Department of Epidemiology, College of Public Health & Health Professions & College of Medicine, University of Florida, Gainesville, Florida, USA.
Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA.
J Am Med Inform Assoc. 2023 Jul 19;30(8):1418-1428. doi: 10.1093/jamia/ocad080.
This study aimed to develop a natural language processing algorithm (NLP) using machine learning (ML) techniques to identify and classify documentation of preoperative cannabis use status.
We developed and applied a keyword search strategy to identify documentation of preoperative cannabis use status in clinical documentation within 60 days of surgery. We manually reviewed matching notes to classify each documentation into 8 different categories based on context, time, and certainty of cannabis use documentation. We applied 2 conventional ML and 3 deep learning models against manual annotation. We externally validated our model using the MIMIC-III dataset.
The tested classifiers achieved classification results close to human performance with up to 93% and 94% precision and 95% recall of preoperative cannabis use status documentation. External validation showed consistent results with up to 94% precision and recall.
Our NLP model successfully replicated human annotation of preoperative cannabis use documentation, providing a baseline framework for identifying and classifying documentation of cannabis use. We add to NLP methods applied in healthcare for clinical concept extraction and classification, mainly concerning social determinants of health and substance use. Our systematically developed lexicon provides a comprehensive knowledge-based resource covering a wide range of cannabis-related concepts for future NLP applications.
We demonstrated that documentation of preoperative cannabis use status could be accurately identified using an NLP algorithm. This approach can be employed to identify comparison groups based on cannabis exposure for growing research efforts aiming to guide cannabis-related clinical practices and policies.
本研究旨在开发一种使用机器学习(ML)技术的自然语言处理(NLP)算法,以识别和分类术前大麻使用状态的文档。
我们开发并应用了一种关键词搜索策略,以在手术前 60 天内从临床文档中识别术前大麻使用状态的文档。我们手动审查匹配的笔记,根据上下文、时间和大麻使用文档的确定性将每份文档分类为 8 个不同类别。我们针对手动注释应用了 2 种传统的 ML 和 3 种深度学习模型。我们使用 MIMIC-III 数据集对外验证了我们的模型。
测试的分类器的分类结果接近人类表现,术前大麻使用状态文档的精度高达 93%和 94%,召回率为 95%。外部验证显示精度和召回率高达 94%。
我们的 NLP 模型成功复制了人类对术前大麻使用文档的注释,为识别和分类大麻使用文档提供了一个基准框架。我们增加了在医疗保健中应用的 NLP 方法,用于临床概念提取和分类,主要涉及健康的社会决定因素和物质使用。我们系统地开发的词汇提供了一个全面的基于知识的资源,涵盖了广泛的大麻相关概念,用于未来的 NLP 应用。
我们证明了可以使用 NLP 算法准确识别术前大麻使用状态的文档。这种方法可用于根据大麻暴露情况识别对照组,以促进旨在指导与大麻相关的临床实践和政策的研究工作。