基于语义相似性的改进最长公共子序列的自动ICD-10编码算法

Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity.

作者信息

Chen YunZhi, Lu HuiJuan, Li LanJuan

机构信息

Zhejiang University School of Medicine, Hangzhou, China.

Hangzhou Vocational and Technical College, Hangzhou, China.

出版信息

PLoS One. 2017 Mar 17;12(3):e0173410. doi: 10.1371/journal.pone.0173410. eCollection 2017.

DOI:10.1371/journal.pone.0173410

PMID:28306739

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5356997/

Abstract

ICD-10(International Classification of Diseases 10th revision) is a classification of a disease, symptom, procedure, or injury. Diseases are often described in patients' medical records with free texts, such as terms, phrases and paraphrases, which differ significantly from those used in ICD-10 classification. This paper presents an improved approach based on the Longest Common Subsequence (LCS) and semantic similarity for automatic Chinese diagnoses, mapping from the disease names given by clinician to the disease names in ICD-10. LCS refers to the longest string that is a subsequence of every member of a given set of strings. The proposed method of improved LCS in this paper can increase the accuracy of processing in Chinese disease mapping.

摘要

国际疾病分类第十版（ICD - 10）是一种对疾病、症状、手术或损伤的分类方法。疾病在患者病历中通常用自由文本描述，如术语、短语和释义，这些与ICD - 10分类中使用的术语有很大差异。本文提出了一种基于最长公共子序列（LCS）和语义相似度的改进方法，用于自动中文诊断，即将临床医生给出的疾病名称映射到ICD - 10中的疾病名称。LCS是指作为给定字符串集合中每个成员子序列的最长字符串。本文提出的改进LCS方法可以提高中文疾病映射处理的准确性。