INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal.
Direção-Geral da Saúde, Portugal.
J Biomed Inform. 2018 Apr;80:64-77. doi: 10.1016/j.jbi.2018.02.011. Epub 2018 Feb 26.
We address the assignment of ICD-10 codes for causes of death by analyzing free-text descriptions in death certificates, together with the associated autopsy reports and clinical bulletins, from the Portuguese Ministry of Health. We leverage a deep neural network that combines word embeddings, recurrent units, and neural attention, for the generation of intermediate representations of the textual contents. The neural network also explores the hierarchical nature of the input data, by building representations from the sequences of words within individual fields, which are then combined according to the sequences of fields that compose the inputs. Moreover, we explore innovative mechanisms for initializing the weights of the final nodes of the network, leveraging co-occurrences between classes together with the hierarchical structure of ICD-10. Experimental results attest to the contribution of the different neural network components. Our best model achieves accuracy scores over 89%, 81%, and 76%, respectively for ICD-10 chapters, blocks, and full-codes. Through examples, we also show that our method can produce interpretable results, useful for public health surveillance.
我们通过分析葡萄牙卫生部的死亡证明中的自由文本描述,以及相关的尸检报告和临床公告,来解决 ICD-10 死因编码的任务。我们利用一种深度神经网络,该网络结合了词嵌入、循环单元和神经注意力,用于生成文本内容的中间表示。该神经网络还通过从单个字段内的单词序列构建表示,然后根据组成输入的字段序列进行组合,探索输入数据的层次性质。此外,我们还探索了利用类之间的共现以及 ICD-10 的层次结构来初始化网络最后节点权重的创新机制。实验结果证明了不同神经网络组件的贡献。我们最好的模型在 ICD-10 章节、块和全码方面的准确率分别超过 89%、81%和 76%。通过示例,我们还表明,我们的方法可以产生可解释的结果,对公共卫生监测有用。