Suppr超能文献

基于多层注意力 BiRNN 的中文临床记录自动 ICD 编码分配。

Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN.

机构信息

School of Computer Science and Engineering, Central South University, Changsha 410083, China; School of Computer Science and Technology, University of South China, Hengyang 421001, China.

School of Computer Science and Engineering, Central South University, Changsha 410083, China.

出版信息

J Biomed Inform. 2019 Mar;91:103114. doi: 10.1016/j.jbi.2019.103114. Epub 2019 Feb 12.

Abstract

International Classification of Diseases (ICD) code is an important label of electronic health record. The automatic ICD code assignment based on the narrative of clinical documents is an essential task which has drawn much attention recently. When Chinese clinical notes are the input corpus, the nature of Chinese brings some issues that need to be considered, such as the accuracy of word segmentation and the representation of single Chinese characters which contain semantics. Taking the lengthy text of patient notes and the representation of Chinese words into account, we present a multilayer attention bidirectional recurrent neural network (MA-BiRNN) model to implement the assignment of disease codes. A hierarchical approach is used to represent the feature of discharge summaries without manual feature engineering. The combination of character level embedding and word level embedding can improve the representation of words. Attention mechanism is introduced into bidirectional long short term memory networks, which helps to solve the performance dropping problem when plain recurrent neural networks encounter long text sequences. The experiment is carried out on a real-world dataset containing 7732 admission records in Chinese and 1177 unique ICD-10 labels. The proposed model achieves 0.639 and 0.766 in F1-score on full-level code and block-level code, respectively. It outperforms the baseline neural network models and achieves the lowest Hamming loss value. Ablation analysis indicates that the multilevel attention mechanism plays a decisive role in the system for dealing with Chinese clinical notes.

摘要

国际疾病分类(ICD)代码是电子健康记录的重要标签。基于临床文档的叙述自动分配 ICD 代码是一项重要任务,最近引起了广泛关注。当中文临床记录作为输入语料库时,中文的特点带来了一些需要考虑的问题,例如分词的准确性和包含语义的单个汉字的表示。考虑到患者记录的冗长文本和中文单词的表示,我们提出了一种多层注意力双向递归神经网络(MA-BiRNN)模型来实现疾病代码的分配。采用分层方法来表示出院小结的特征,无需手动特征工程。字符级嵌入和单词级嵌入的组合可以提高单词的表示能力。注意力机制被引入到双向长短期记忆网络中,有助于解决当朴素递归神经网络遇到长文本序列时性能下降的问题。实验在一个包含 7732 条中文入院记录和 1177 个独特 ICD-10 标签的真实数据集上进行。所提出的模型在全级别代码和块级别代码上的 F1 得分分别达到 0.639 和 0.766,优于基线神经网络模型,并实现了最低的汉明损失值。消融分析表明,多层次注意力机制在处理中文临床记录的系统中起着决定性的作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验