Norwegian Centre for E-health Research, Tromsø, Norway.
Department of Computer Science, UiT - The Arctic University of Norway, Tromsø, Norway.
AMIA Annu Symp Proc. 2024 Jan 11;2023:465-473. eCollection 2023.
With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish. Most importantly, it can be used in a coding support setup where a tool can recommend potential codes to the coders. This reduces the range of potential codes to consider and, in turn, reduces the workload of the coder. Moreover, the de-identified and pseudonymised dataset is open to use for academic users.
随着自然语言处理和深度学习的最新进展,开发能够帮助医疗编码员进行 ICD-10 诊断编码并提高其编码出院小结效率的工具变得更加可行。为此,这些模型开发中的一个重要组成部分是用于训练它们的数据集。在这项研究中,提出了这样的数据集,并表明其中一个数据集可用于开发基于 BERT 的语言模型,该模型可以在为用瑞典语编写的出院小结分配 ICD-10 代码方面始终表现出色。最重要的是,它可以在编码支持设置中使用,其中工具可以向编码员推荐潜在的代码。这减少了需要考虑的潜在代码范围,从而减少了编码员的工作量。此外,经过去标识和化名处理的数据集可供学术用户使用。