基于临床记录的急诊科阿片类药物滥用患者检测机器学习模型的开发与评估

Development and Evaluation of Machine Learning Models for the Detection of Emergency Department Patients with Opioid Misuse from Clinical Notes.

作者信息

Shahid Usman, Parde Natalie, Smith Dale L, Dickinson Grayson, Bianco Joseph, Thorpe Dillon, Hota Madhav, Afshar Majid, Karnik Niranjan S, Chhabra Neeraj

机构信息

AI.Health4All Center for Health Equity using Machine Learning and Artificial Intelligence, College of Medicine, University of Illinois Chicago, Chicago, IL, USA.

Natural Language Processing Laboratory, Department of Computer Science, University of Illinois Chicago, Chicago, IL USA.

出版信息

medRxiv. 2024 Dec 12:2024.12.11.24318875. doi: 10.1101/2024.12.11.24318875.

DOI:10.1101/2024.12.11.24318875

PMID:39711725

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11661385/

Abstract

OBJECTIVES

The accurate identification of Emergency Department (ED) encounters involving opioid misuse is critical for health services, research, and surveillance. We sought to develop natural language processing (NLP)-based models for the detection of ED encounters involving opioid misuse.

METHODS

A sample of ED encounters enriched for opioid misuse was manually annotated and clinical notes extracted. We evaluated classic machine learning (ML) methods, fine-tuning of publicly available pretrained language models, and a previously developed convolutional neural network opioid classifier for use on hospitalized patients (SMART-AI). Performance was compared to ICD-10-CM codes. Both raw text and text transformed to the United Medical Language System were evaluated. Face validity was evaluated by term feature importance.

RESULTS

There were 1123 encounters used for training, validation, and testing. Of the classic ML methods, XGBoost had the highest AU_PRC (0.936), accuracy (0.887), and F1 score (0.863) which outperformed ICD-10-CM codes [accuracy 0.870; F1 0.830]. Logistic regression, support vector machine, and XGBoost models had higher AU_PRC using transformed text, while decision trees performed better using raw text. Excluding XGBoost, fine-tuned pre-trained language models outperformed classic ML methods. The best performing model was the fine-tuned SMART-AI based model with domain adaptation [AU_PRC 0.948; accuracy 0.882; F1 0.851]. Explainability analyses showed the most predictive terms were 'heroin', 'opioids', 'alcoholic intoxication, chronic', 'cocaine', 'opiates', and 'suboxone'.

CONCLUSIONS

NLP-based models outperform entry of ICD-10-CM diagnosis codes for the detection of ED encounters with opioid misuse. Fine tuning with domain adaptation for pre-trained language models resulted in improved performance.

摘要

目的

准确识别急诊科（ED）中涉及阿片类药物滥用的就诊情况对于卫生服务、研究和监测至关重要。我们试图开发基于自然语言处理（NLP）的模型来检测急诊科中涉及阿片类药物滥用的就诊情况。

方法

对富含阿片类药物滥用情况的急诊科就诊样本进行人工标注并提取临床记录。我们评估了经典机器学习（ML）方法、公开可用的预训练语言模型的微调，以及先前开发的用于住院患者的卷积神经网络阿片类药物分类器（SMART-AI）。将性能与ICD-10-CM编码进行比较。对原始文本和转换为统一医学语言系统的文本均进行了评估。通过术语特征重要性评估表面效度。

结果

共有1123次就诊用于训练、验证和测试。在经典ML方法中，XGBoost的AU_PRC最高（0.936）、准确率（0.887）和F1分数（0.863），优于ICD-10-CM编码[准确率0.870；F1 0.830]。逻辑回归、支持向量机和XGBoost模型在使用转换后的文本时具有更高的AU_PRC，而决策树在使用原始文本时表现更好。除XGBoost外，微调后的预训练语言模型优于经典ML方法。表现最佳的模型是基于SMART-AI并进行了领域适应的微调模型[AU_PRC 0.948；准确率0.882；F1 0.851]。可解释性分析表明，最具预测性的术语是“海洛因”、“阿片类药物”、“慢性酒精中毒”、“可卡因”、“阿片制剂”和“丁丙诺啡”。