Suppr超能文献

迈向自动化临床编码。

Towards automated clinical coding.

机构信息

University College London, Gower Street, London WC1E 6BT, UK.

出版信息

Int J Med Inform. 2018 Dec;120:50-61. doi: 10.1016/j.ijmedinf.2018.09.021. Epub 2018 Oct 2.

Abstract

BACKGROUND

Patients' encounters with healthcare services must undergo clinical coding. These codes are typically derived from free-text notes. Manual clinical coding is expensive, time-consuming and prone to error. Automated clinical coding systems have great potential to save resources, and realtime availability of codes would improve oversight of patient care and accelerate research. Automated coding is made challenging by the idiosyncrasies of clinical text, the large number of disease codes and their unbalanced distribution.

METHODS

We explore methods for representing clinical text and the labels in hierarchical clinical coding ontologies. Text is represented as term frequency-inverse document frequency counts and then as word embeddings, which we use as input to recurrent neural networks. Labels are represented atomically, and then by learning representations of each node in a coding ontology and composing a representation for each label from its respective node path. We consider different strategies for initialisation of the node representations. We evaluate our methods using the publicly-available Medical Information Mart for Intensive Care III dataset: we extract the history of presenting illness section from each discharge summary in the dataset, then predicting the International Classification of Diseases, ninth revision, Clinical Modification codes associated with these.

RESULTS

Composing the label representations from the clinical-coding-ontology nodes increased weighted F1 for prediction of the 17,561 disease labels to 0.264-0.281 from 0.232-0.249 for atomic representations. Recurrent neural network text representation improved weighted F1 for prediction of the 19 disease-category labels to 0.682-0.701 from 0.662-0.682 using term frequency-inverse document frequency. However, term frequency-inverse document frequency outperformed recurrent neural networks for prediction of the 17,561 disease labels.

CONCLUSIONS

This study demonstrates that hierarchically-structured medical knowledge can be incorporated into statistical models, and produces improved performance during automated clinical coding. This performance improvement results primarily from improved representation of rarer diseases. We also show that recurrent neural networks improve representation of medical text in some settings. Learning good representations of the very rare diseases in clinical coding ontologies from data alone remains challenging, and alternative means of representing these diseases will form a major focus of future work on automated clinical coding.

摘要

背景

患者与医疗服务的交互必须经过临床编码。这些代码通常源自自由文本注释。手动临床编码既昂贵又耗时,且容易出错。自动化临床编码系统具有巨大的资源节约潜力,并且代码的实时可用性将改善对患者护理的监督并加速研究。由于临床文本的特殊性、疾病代码数量庞大且分布不均,自动化编码具有挑战性。

方法

我们探索了表示临床文本和分层临床编码本体标签的方法。文本表示为词频-逆文档频率计数,然后表示为词向量,我们将其用作递归神经网络的输入。标签以原子形式表示,然后通过学习编码本体中每个节点的表示,并从其各自的节点路径为每个标签组成表示。我们考虑了节点表示初始化的不同策略。我们使用公开的重症监护医疗信息集市 III 数据集评估我们的方法:我们从数据集中的每个出院记录中提取发病史部分,然后预测与这些部分相关的国际疾病分类,第九修订版,临床修正代码。

结果

从临床编码本体节点组合标签表示,将预测 17561 种疾病标签的加权 F1 从原子表示的 0.232-0.249 提高到 0.264-0.281。使用词频-逆文档频率,递归神经网络文本表示将预测 19 种疾病类别标签的加权 F1 从 0.662-0.682 提高到 0.682-0.701。然而,词频-逆文档频率在预测 17561 种疾病标签方面优于递归神经网络。

结论

本研究表明,层次结构的医学知识可以纳入统计模型,并在自动化临床编码过程中提高性能。这种性能的提高主要源于对罕见疾病的更好表示。我们还表明,在某些情况下,递归神经网络可以改善医学文本的表示。仅从数据中学习临床编码本体中非常罕见疾病的良好表示仍然具有挑战性,并且表示这些疾病的替代方法将成为自动化临床编码未来工作的主要重点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验