Suppr超能文献

一种通过语义相似性估计自动编码中文诊断的分层方法。

A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation.

作者信息

Ning Wenxin, Yu Ming, Zhang Runtong

机构信息

Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, PR China.

Department of Information Management, School of Economics and Management, Beijing Jiaotong University, Beijing, 100084, PR China.

出版信息

BMC Med Inform Decis Mak. 2016 Mar 3;16:30. doi: 10.1186/s12911-016-0269-4.

Abstract

BACKGROUND

The accumulation of medical documents in China has rapidly increased in the past years. We focus on developing a method that automatically performs ICD-10 code assignment to Chinese diagnoses from the electronic medical records to support the medical coding process in Chinese hospitals.

METHODS

We propose two encoding methods: one that directly determines the desired code (flat method), and one that hierarchically determines the most suitable code until the desired code is obtained (hierarchical method). Both methods are based on instances from the standard diagnostic library, a gold standard dataset in China. For the first time, semantic similarity estimation between Chinese words are applied in the biomedical domain with the successful implementation of knowledge-based and distributional approaches. Characteristics of the Chinese language are considered in implementing distributional semantics. We test our methods against 16,330 coding instances from our partner hospital.

RESULTS

The hierarchical method outperforms the flat method in terms of accuracy and time complexity. Representing distributional semantics using Chinese characters can achieve comparable performance to the use of Chinese words. The diagnoses in the test set can be encoded automatically with micro-averaged precision of 92.57 %, recall of 89.63 %, and F-score of 91.08 %. A sharp decrease in encoding performance is observed without semantic similarity estimation.

CONCLUSION

The hierarchical nature of ICD-10 codes can enhance the performance of the automated code assignment. Semantic similarity estimation is demonstrated indispensable in dealing with Chinese medical text. The proposed method can greatly reduce the workload and improve the efficiency of the code assignment process in Chinese hospitals.

摘要

背景

在过去几年中,中国医学文档的积累迅速增加。我们专注于开发一种方法,该方法能从电子病历中自动为中文诊断分配ICD - 10编码,以支持中国医院的医学编码过程。

方法

我们提出了两种编码方法:一种是直接确定所需编码的方法(扁平方法),另一种是分层确定最合适的编码直至获得所需编码的方法(分层方法)。这两种方法均基于来自标准诊断库(中国的一个黄金标准数据集)的实例。首次将中文词语之间的语义相似性估计应用于生物医学领域,并成功实现了基于知识和分布的方法。在实现分布语义时考虑了中文的特点。我们使用来自合作医院的16330个编码实例对我们的方法进行测试。

结果

分层方法在准确性和时间复杂度方面优于扁平方法。使用汉字表示分布语义可获得与使用中文词语相当的性能。测试集中的诊断可以自动编码,微观平均精度为92.57%,召回率为89.63%,F值为91.08%。在没有语义相似性估计的情况下,观察到编码性能急剧下降。

结论

ICD - 10编码的分层性质可以提高自动编码分配的性能。语义相似性估计在处理中文医学文本中被证明是不可或缺的。所提出的方法可以大大减少中国医院编码分配过程的工作量并提高效率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验