Eye Center of the University Hospital Freiburg, Medical Faculty of the Albert-Ludwigs-University Freiburg, Freiburg, Germany.
Department of Ophthalmology, Asklepios Hospital Nord-Heidberg, Hamburg, Germany.
Sci Rep. 2024 Apr 19;14(1):9035. doi: 10.1038/s41598-024-59926-3.
Physicians' letters are the optimal source of diagnoses for registries. However, most registries demand for diagnosis codes such as ICD-10. We herein describe an algorithm that infers ICD-10 codes from German ophthalmologic physicians' letters. We assess the method in three German eye hospitals. Our algorithm is based on the nearest-neighbor method as well as on a large thesaurus for ICD-10 codes. This thesaurus was embedded into a Word2Vec space created from anonymized physicians' reports of the first hospital. For evaluation, each of the three hospitals sent all diagnoses taken from 100 letters. The inferred ICD-10 codes were evaluated for correctness by the senders. A total of 3332 natural language terms had been sent in (812 hospital one, 1473 hospital two, 1047 hospital three). A total of 526 non-diagnoses were excluded upfront. 2806 ICD-10 codes were inferred (771 hospital one, 1226 hospital two, 809 hospital three). In the first hospital, 98% were fully correct and 99% correct at the level of the superordinate disease concept. The percentages in hospital two were 69% and 86%. The respective numbers for hospital three were 69% and 91%. Our simple method is capable of inferring ICD-10 codes for German natural language diagnoses, especially when the embedding space has been built with physicians' letters from the same hospital. The method may yield sufficient accuracy for many tasks in the multi-centric setting and can easily be adapted to other languages/specialities.
医生的信件是注册的最佳诊断来源。然而,大多数登记处都要求使用 ICD-10 等诊断代码。我们在此描述一种从德国眼科医生的信件中推断 ICD-10 代码的算法。我们在三家德国眼科医院评估了该方法。我们的算法基于最近邻方法以及 ICD-10 代码的大型词库。该词库被嵌入到由第一家医院匿名医生报告创建的 Word2Vec 空间中。为了评估,每家医院都将从 100 封信中提取的所有诊断发送给我们。发送方评估推断出的 ICD-10 代码的正确性。共发送了 3332 个自然语言术语(812 个来自医院 1,1473 个来自医院 2,1047 个来自医院 3)。总共排除了 526 个非诊断。推断出 2806 个 ICD-10 代码(医院 1 有 771 个,医院 2 有 1226 个,医院 3 有 809 个)。在第一家医院,98%的推断结果完全正确,77%的结果在上级疾病概念上正确。医院 2 的比例分别为 69%和 86%。医院 3 的相应比例分别为 69%和 91%。我们的简单方法能够推断出德国自然语言诊断的 ICD-10 代码,尤其是当嵌入空间是使用同一家医院的医生信件构建时。该方法在多中心环境中可能具有足够的准确性,并且可以轻松适应其他语言/专业。