Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
Division of Digital Innovation and Data Analytics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
PLoS One. 2022 Aug 4;17(8):e0270595. doi: 10.1371/journal.pone.0270595. eCollection 2022.
Allergic reactions to medication range from mild to severe or even life-threatening. Proper documentation of patient allergy information is critical for safe prescription, avoiding drug interactions, and reducing healthcare costs. Allergy information is regularly obtained during the medical interview, but is often poorly documented in electronic health records (EHRs). While many EHRs allow for structured adverse drug reaction (ADR) reporting, a free-text entry is still common. The resulting information is neither interoperable nor easily reusable for other applications, such as clinical decision support systems and prescription alerts. Current approaches require pharmacists to review and code ADRs documented by healthcare professionals. Recently, the effectiveness of machine algorithms in natural language processing (NLP) has been widely demonstrated. Our study aims to develop and evaluate different NLP algorithms that can encode unstructured ADRs stored in EHRs into institutional symptom terms. Our dataset consists of 79,712 pharmacist-reviewed drug allergy records. We evaluated three NLP techniques: Naive Bayes-Support Vector Machine (NB-SVM), Universal Language Model Fine-tuning (ULMFiT), and Bidirectional Encoder Representations from Transformers (BERT). We tested different general-domain pre-trained BERT models, including mBERT, XLM-RoBERTa, and WanchanBERTa, as well as our domain-specific AllergyRoBERTa, which was pre-trained from scratch on our corpus. Overall, BERT models had the highest performance. NB-SVM outperformed ULMFiT and BERT for several symptom terms that are not frequently coded. The ensemble model achieved an exact match ratio of 95.33%, a F1 score of 98.88%, and a mean average precision of 97.07% for the 36 most frequently coded symptom terms. The model was then further developed into a symptom term suggestion system and achieved a Krippendorff's alpha agreement coefficient of 0.7081 in prospective testing with pharmacists. Some degree of automation could both accelerate the availability of allergy information and reduce the efforts for human coding.
药物过敏反应的程度从轻度到重度,甚至危及生命。正确记录患者过敏信息对于安全处方、避免药物相互作用和降低医疗保健成本至关重要。过敏信息通常在医疗访谈中获得,但在电子健康记录 (EHR) 中记录往往很差。虽然许多 EHR 允许进行结构化药物不良反应 (ADR) 报告,但仍普遍采用自由文本输入。由此产生的信息既不可互操作,也不易用于其他应用程序,如临床决策支持系统和处方警报。目前的方法需要药剂师审查和编码医疗保健专业人员记录的 ADR。最近,机器算法在自然语言处理 (NLP) 中的有效性得到了广泛证明。我们的研究旨在开发和评估不同的 NLP 算法,将 EHR 中存储的非结构化 ADR 编码为机构症状术语。我们的数据集包括 79712 份经过药剂师审查的药物过敏记录。我们评估了三种 NLP 技术:朴素贝叶斯-支持向量机 (NB-SVM)、通用语言模型微调 (ULMFiT) 和来自转换器的双向编码器表示 (BERT)。我们测试了不同的通用领域预训练 BERT 模型,包括 mBERT、XLM-RoBERTa 和 WanchanBERTa,以及我们从头开始在语料库上预训练的特定于领域的 AllergyRoBERTa。总体而言,BERT 模型的性能最高。对于一些不常编码的症状术语,NB-SVM 的性能优于 ULMFiT 和 BERT。对于 36 个最常编码的症状术语,集成模型的精确匹配率为 95.33%,F1 得分为 98.88%,平均准确率为 97.07%。然后,该模型进一步开发为症状术语建议系统,并在与药剂师的前瞻性测试中实现了 0.7081 的 Krippendorff's alpha 一致性系数。一定程度的自动化可以加快过敏信息的可用性,并减少人工编码的工作量。