Nath Namrata, Lee Sang-Heon, Lee Ivan
UniSA STEM, University of South Australia, GPO Box 2471, Adelaide, SA, 5001, Australia.
UniSA STEM, University of South Australia, Adelaide, Australia.
Comput Biol Med. 2023 Oct;165:107422. doi: 10.1016/j.compbiomed.2023.107422. Epub 2023 Aug 30.
Notes documented by clinicians, such as patient histories, hospital courses, lab reports and others are often annotated with standardized clinical codes by medical coders to facilitate a variety of secondary processing applications such as billing and statistical analyses. Clinical coding, traditionally manual and labor-intensive, has seen a surge in research interest by deep learning researchers pursuing to automate it. However, deep learning methods require large volumes of annotated clinical data for training and offer little to explain why codes were assigned to pieces of text. In this paper, we propose an unsupervised method which does not need annotated clinical text and is fully interpretable, by using Named Entity and Attribute Recognition and word embeddings specialized for the clinical domain. These methods successfully glean important information from large volumes of clinical notes and encode them effectively in order to perform automatic clinical coding.
临床医生记录的笔记,如患者病史、住院过程、实验室报告等,通常由医学编码员用标准化临床代码进行注释,以促进各种二次处理应用,如计费和统计分析。传统上,临床编码是人工的且劳动强度大,这引发了深度学习研究人员的浓厚兴趣,他们试图实现其自动化。然而,深度学习方法需要大量带注释的临床数据进行训练,并且几乎无法解释为何将代码分配给文本片段。在本文中,我们提出一种无监督方法,该方法无需带注释的临床文本且完全可解释,通过使用命名实体和属性识别以及专门针对临床领域的词嵌入来实现。这些方法成功地从大量临床笔记中收集重要信息并对其进行有效编码,以执行自动临床编码。