Suppr超能文献

一种基于深度学习方法的冠心病自动ICD编码

Automated ICD coding for coronary heart diseases by a deep learning method.

作者信息

Zhao Shuai, Diao Xiaolin, Xia Yun, Huo Yanni, Cui Meng, Wang Yuxin, Yuan Jing, Zhao Wei

机构信息

Department of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China.

Medical Record Department, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China.

出版信息

Heliyon. 2023 Feb 27;9(3):e14037. doi: 10.1016/j.heliyon.2023.e14037. eCollection 2023 Mar.

Abstract

Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Based on Fuwai-CHD and MIMIC-III-CHD, which are a private dataset from Fuwai Hospital and the CHD-related subset of a public dataset named MIMIC-III respectively, this study aimed at automated CHD coding by a deep learning method, which mainly consists of three modules. The first is a variant module responsible for encoding clinical text. In the module, we fine-tuned variants with masked language model on clinical text, and proposed a truncation method to tackle the problem that variants generally cannot handle sequences containing more than 512 tokens. The second is a ord2vec module for encoding code titles and the third is a label-ention module for integrating the embeddings of clinical text and code titles. In short, we named the method . We compared against some widely studied baselines, and found that performed best in most of the coding missions. Specifically, reached a 1 of 96.2% and a of 98.9% for the top-100 most frequent codes in Fuwai-CHD, which covered 89.2% of the total code occurrences. When predicting the top-50 most frequent codes in MIMIC-III-CHD, reached a 1 of 40.5% and a of 66.1%. Moreover, was capable of locating informative tokens from clinical text for predicting the target codes. In summary, can not only suggest CHD codes accurately, but also possess robust interpretability, hence has great potential in facilitating CHD coding in practice.

摘要

通过专注于某些特定疾病的机器学习进行自动ICD编码一直是一个热门话题。作为主要死因之一,冠心病(CHD)很少被相关研究专门研究,这可能是由于缺乏针对该疾病的具体数据。基于分别来自阜外医院的私有数据集Fuwai-CHD和名为MIMIC-III的公共数据集的CHD相关子集MIMIC-III-CHD,本研究旨在通过一种深度学习方法进行冠心病自动编码,该方法主要由三个模块组成。第一个是负责编码临床文本的变异模块。在该模块中,我们使用掩码语言模型在临床文本上对变异进行微调,并提出了一种截断方法来解决变异通常无法处理包含超过512个词元的序列的问题。第二个是用于编码代码标题的ord2vec模块,第三个是用于整合临床文本和代码标题嵌入的标签注意力模块。简而言之,我们将该方法命名为 。我们将 与一些广泛研究的基线进行比较,发现 在大多数编码任务中表现最佳。具体而言,对于Fuwai-CHD中最频繁出现的前100个代码, 的top-1准确率达到96.2%,top-10准确率达到98.9%,这些代码覆盖了总代码出现次数的89.2%。在预测MIMIC-III-CHD中最频繁出现的前50个代码时, 的top-1准确率达到40.5%,top-10准确率达到66.1%。此外, 能够从临床文本中定位信息性词元以预测目标代码。总之, 不仅可以准确地给出冠心病代码,还具有强大的可解释性,因此在促进冠心病编码实践方面具有巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87dd/10018467/bf1604c74d6a/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验