Suppr超能文献

电子健康记录临床笔记中的二元首字母缩写词消歧及其在计算表型分析中的应用

Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping.

作者信息

Link Nicholas B, Huang Sicong, Cai Tianrun, Sun Jiehuan, Dahal Kumar, Costa Lauren, Cho Kelly, Liao Katherine, Cai Tianxi, Hong Chuan

机构信息

VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.

VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States.

出版信息

Int J Med Inform. 2022 Apr 1;162:104753. doi: 10.1016/j.ijmedinf.2022.104753.

Abstract

OBJECTIVE

The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes.

METHODS

We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis.

RESULTS

CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis.

CONCLUSION

CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

摘要

目的

在过去十年中,电子健康记录(EHR)系统的使用不断增加,随之而来的是从非结构化临床叙述中提取信息的需求。然而,临床记录中经常包含具有多种潜在含义(语义)的首字母缩略词,传统的自然语言处理(NLP)技术无法区分这些语义。在本研究中,我们介绍了一种用于二元首字母缩略词消歧的半监督方法,即对临床EHR记录中的首字母缩略词进行目标语义分类的任务。

方法

我们开发了一种半监督集成机器学习(CASEml)算法,通过利用语义嵌入、就诊级文本和计费信息来自动识别首字母缩略词何时表示目标语义。该算法使用退伍军人事务医院系统的记录数据进行验证,以对三个首字母缩略词的含义进行分类:RA、MS和MI。我们将CASEml的性能与另一种标准半监督方法以及选择最常见首字母缩略词语义的基线指标进行了比较。除了评估这些方法在首字母缩略词特定实例上的性能外,我们还评估了首字母缩略词消歧对类风湿性关节炎的NLP驱动表型分析的影响。

结果

CASEml对RA、MS和MI的准确率分别达到0.947、0.911和0.706,高于标准基线指标,并且(平均)高于一种先进的半监督方法。此外,我们证明将CASEml应用于医疗记录可提高类风湿性关节炎表型算法的AUC。

结论

CASEml是一种新颖的方法,能够准确消除临床记录中首字母缩略词的歧义,并且比常用的监督和半监督机器学习方法具有优势。此外,CASEml提高了依赖于模糊首字母缩略词的NLP任务的性能,例如表型分析。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验