从自由文本死亡证明中对癌症进行ICD - 10自动分类。

Automatic ICD-10 classification of cancers from free-text death certificates.

作者信息

Koopman Bevan, Zuccon Guido, Nguyen Anthony, Bergheim Anton, Grayson Narelle

机构信息

The Australian e-Health Research Centre, CSIRO, Brisbane, Australia.

Queensland University of Technology, Brisbane, Australia.

出版信息

Int J Med Inform. 2015 Nov;84(11):956-65. doi: 10.1016/j.ijmedinf.2015.08.004. Epub 2015 Aug 13.

DOI:10.1016/j.ijmedinf.2015.08.004

PMID:26323193

Abstract

OBJECTIVE

Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates--an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates.

METHODS

Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model.

RESULTS

The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable.

CONCLUSION

The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.

摘要

目的

死亡证明是癌症死亡率统计的宝贵数据来源；然而，只有从证明中提取准确、定量的数据，其价值才能得以体现，而这一目标因自然语言书写的证明数量庞大且性质各异而受阻。本文提出了一种用于从死亡证明中识别癌症相关死因的自动分类系统。

方法

从447336份死亡证明中提取了详细特征，包括术语、n元语法和SNOMED CT概念。这些特征用于训练支持向量机分类器（每种癌症类型一个分类器）。分类器以级联架构部署：第一级识别癌症的存在（即癌症/非癌症二元分类），第二级识别癌症类型（根据ICD-10分类系统）。使用一个留出的测试集根据精确率、召回率和F值评估分类器的有效性。此外，进行了详细的特征分析以揭示成功的癌症分类模型的特征。

结果

该系统在将癌症识别为根本死因方面非常有效（F值0.94）。该系统在确定常见癌症的癌症类型方面也很有效（F值0.7）。由于训练数据很少，罕见癌症难以准确分类（F值0.12）。影响性能的因素包括训练数据量和某些模糊的癌症（例如胃部区域的癌症）。特征分析表明，多种特征组合对癌症类型分类很重要，其中SNOMED CT概念和肿瘤学特定形态特征被证明最有价值。

结论

本研究中提出的系统可从大量自由文本死亡证明中自动识别和表征癌症。这使得癌症登记处等组织能够及时、准确地监测和报告癌症死亡率。此外，这些方法和发现通常不仅适用于癌症分类，还适用于死亡证明之外的其他医学文本来源。

相似文献

Automatic ICD-10 classification of cancers from free-text death certificates.从自由文本死亡证明中对癌症进行ICD - 10自动分类。

Int J Med Inform. 2015 Nov;84(11):956-65. doi: 10.1016/j.ijmedinf.2015.08.004. Epub 2015 Aug 13.

Extracting cancer mortality statistics from death certificates: A hybrid machine learning and rule-based approach for common and rare cancers.从死亡证明中提取癌症死亡率统计数据：一种用于常见和罕见癌症的混合机器学习和基于规则的方法。

Artif Intell Med. 2018 Jul;89:1-9. doi: 10.1016/j.artmed.2018.04.011. Epub 2018 May 10.

Automatic classification of diseases from free-text death certificates for real-time surveillance.用于实时监测的基于自由文本死亡证明的疾病自动分类

BMC Med Inform Decis Mak. 2015 Jul 15;15:53. doi: 10.1186/s12911-015-0174-2.

Classification of cancer-related death certificates using machine learning.使用机器学习对癌症相关死亡证明进行分类。

Australas Med J. 2013 May 30;6(5):292-9. doi: 10.4066/AMJ.2013.1654. Print 2013.

Automatic classification of free-text medical causes from death certificates for reactive mortality surveillance in France.法国反应性死亡率监测中死亡证明的自由文本医疗原因自动分类。

Int J Med Inform. 2019 Nov;131:103915. doi: 10.1016/j.ijmedinf.2019.06.022. Epub 2019 Jul 6.

Enhancing timeliness of drug overdose mortality surveillance: A machine learning approach.提高药物过量死亡率监测的及时性：一种机器学习方法。

PLoS One. 2019 Oct 16;14(10):e0223318. doi: 10.1371/journal.pone.0223318. eCollection 2019.

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。

J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT：一种用于从医学叙述中映射短语概念的机器学习系统。

J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.

Does quality control of death certificates in hospitals have an impact on cause of death statistics?医院死亡证明的质量控制对死因统计有影响吗？

Tidsskr Nor Laegeforen. 2013 Apr 9;133(7):750-5. doi: 10.4045/tidsskr.12.0943.

Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.使用文本分类技术从法医尸检报告预测死亡原因：一项比较研究。

J Forensic Leg Med. 2018 Jul;57:41-50. doi: 10.1016/j.jflm.2017.07.001. Epub 2017 Jul 4.

引用本文的文献

Automatic ICD-10 coding: Deep semantic matching based on analogical reasoning.自动ICD-10编码：基于类比推理的深度语义匹配

Heliyon. 2023 Apr 19;9(4):e15570. doi: 10.1016/j.heliyon.2023.e15570. eCollection 2023 Apr.

Automated ICD coding for coronary heart diseases by a deep learning method.一种基于深度学习方法的冠心病自动ICD编码

Heliyon. 2023 Feb 27;9(3):e14037. doi: 10.1016/j.heliyon.2023.e14037. eCollection 2023 Mar.

Comparison of different feature extraction methods for applicable automated ICD coding.不同特征提取方法在适用的自动化 ICD 编码中的比较。

BMC Med Inform Decis Mak. 2022 Jan 12;22(1):11. doi: 10.1186/s12911-022-01753-5.

AUTOMATIC ICD-10 CODING USING PRESCRIBED DRUGS DATA.利用处方药物数据进行自动 ICD-10 编码。

Perspect Health Inf Manag. 2021 Jul 1;18(3):1f. eCollection 2021 Summer.

The Application of Projection Word Embeddings on Medical Records Scoring System.投影词嵌入在病历评分系统中的应用

Healthcare (Basel). 2021 Sep 29;9(10):1298. doi: 10.3390/healthcare9101298.

Medical code prediction via capsule networks and ICD knowledge.基于胶囊网络和 ICD 知识的医疗编码预测。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):55. doi: 10.1186/s12911-021-01426-9.

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.使用多滤波器残差卷积神经网络从临床文本中进行ICD编码

Proc AAAI Conf Artif Intell. 2020 Feb;34(5):8180-8187. doi: 10.1609/aaai.v34i05.6331. Epub 2020 Apr 3.

Can structured EHR data support clinical coding? A data mining approach.结构化电子健康记录数据能否支持临床编码？一种数据挖掘方法。

Health Syst (Basingstoke). 2020 Mar 1;10(2):138-161. doi: 10.1080/20476965.2020.1729666.

Neural Machine Translation-Based Automated Current Procedural Terminology Classification System Using Procedure Text: Development and Validation Study.基于神经机器翻译的使用手术文本的自动当前手术操作术语分类系统：开发与验证研究

JMIR Form Res. 2021 May 26;5(5):e22461. doi: 10.2196/22461.

Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.系统医学术语命名法（SNOMED CT）在医疗保健中处理自由文本的应用：系统范围综述。

J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从自由文本死亡证明中对癌症进行ICD - 10自动分类。

Automatic ICD-10 classification of cancers from free-text death certificates.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献