Suppr超能文献

基于本体感知深度学习的医学缩略语自动消歧

Automatically disambiguating medical acronyms with ontology-aware deep learning.

机构信息

Department of Computer Science, University of Toronto, Toronto, Canada.

DATA Team & Techna Institute, University Health Network, Toronto, Canada.

出版信息

Nat Commun. 2021 Sep 7;12(1):5319. doi: 10.1038/s41467-021-25578-4.

Abstract

Modern machine learning (ML) technologies have great promise for automating diverse clinical and research workflows; however, training them requires extensive hand-labelled datasets. Disambiguating abbreviations is important for automated clinical note processing; however, broad deployment of ML for this task is restricted by the scarcity and imbalance of labeled training data. In this work we present a method that improves a model's ability to generalize through novel data augmentation techniques that utilizes information from biomedical ontologies in the form of related medical concepts, as well as global context information within the medical note. We train our model on a public dataset (MIMIC III) and test its performance on automatically generated and hand-labelled datasets from different sources (MIMIC III, CASI, i2b2). Together, these techniques boost the accuracy of abbreviation disambiguation by up to 17% on hand-labeled data, without sacrificing performance on a held-out test set from MIMIC III.

摘要

现代机器学习 (ML) 技术在自动化各种临床和研究工作流程方面具有巨大的潜力;然而,训练它们需要广泛的手动标记数据集。消除缩写词对于自动化临床记录处理很重要;然而,由于标记训练数据的稀缺性和不平衡,限制了 ML 对此任务的广泛应用。在这项工作中,我们提出了一种通过利用生物医学本体的相关医学概念形式的信息以及医学记录中的全局上下文信息的新的数据增强技术来提高模型泛化能力的方法。我们在一个公共数据集 (MIMIC III) 上训练我们的模型,并在来自不同来源的自动生成和手动标记数据集 (MIMIC III、CASI、i2b2) 上测试其性能。这些技术结合起来,在手写标记数据上将缩写词消歧的准确性提高了 17%,而不会牺牲对 MIMIC III 中保留测试集的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f72/8423722/06afc8bb085b/41467_2021_25578_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验