用于信息提取降噪的手术病理报告的机器学习分类及语块识别

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.

作者信息

Napolitano Giulio, Marshall Adele, Hamilton Peter, Gavin Anna T

机构信息

Institut für Medizinische Biometrie, Informatik und Epidemiologie (IMBIE), Universität Bonn, Haus 325/11/1.OG/Raum 620, Sigmund-Freud-Straße 25, 53105 Bonn, Germany.

Queen's University Belfast, School of Mathematics and Physics, University Road, Belfast BT7 1NN, United Kingdom.

出版信息

Artif Intell Med. 2016 Jun;70:77-83. doi: 10.1016/j.artmed.2016.06.001. Epub 2016 Jun 8.

DOI:10.1016/j.artmed.2016.06.001

PMID:27431038

Abstract

BACKGROUND AND AIMS

Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

MATERIALS AND METHODS

The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: 'semi-structured' and 'unstructured'. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

RESULTS

The best result of 99.4% accuracy - which included only one semi-structured report predicted as unstructured - was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

CONCLUSIONS

These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

摘要

背景与目的

用于癌症相关临床文档文本挖掘的机器学习技术尚未得到充分探索。本文介绍了一些用于自由文本乳腺癌病理报告预处理的技术，旨在促进与癌症分期相关信息的提取。

材料与方法

第一种技术使用免费软件RapidMiner根据报告的总体布局将其分类为“半结构化”和“非结构化”。第二种技术使用开源语言工程框架GATE开发，旨在预测报告文本中包含与癌症形态、肿瘤大小、激素受体状态和阳性淋巴结数量相关信息的片段。分类器分别在来自北爱尔兰癌症登记处的635份和163份手动分类或注释的报告集上进行训练和测试。

结果

布局分类器使用k最近邻算法，采用带有停用词过滤器和修剪的二元词出现词向量类型，产生了99.4%准确率的最佳结果——其中仅一份半结构化报告被预测为非结构化。对于片段识别，除了预测包含癌症形态的片段外，在所有情况下使用相同参数的PAUM算法都能得到最佳结果。对于半结构化报告，精确率和召回率分别在0.97至0.94以及0.92至0.83之间，而对于非结构化报告，精确率和召回率分别在0.91至0.64以及0.68至0.41之间。当分类器在半结构化报告上训练但在非结构化报告上测试时，结果较差。

结论

这些结果表明，预测报告的布局是可行且有益的，并且报告中哪些部分可能包含特定信息的预测准确性对报告布局和所寻求信息的类型敏感。

相似文献

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.

Artif Intell Med. 2016 Jun;70:77-83. doi: 10.1016/j.artmed.2016.06.001. Epub 2016 Jun 8.

PDF text classification to leverage information extraction from publication reports.

J Biomed Inform. 2016 Jun;61:141-8. doi: 10.1016/j.jbi.2016.03.026. Epub 2016 Apr 1.

Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning.

J Am Med Inform Assoc. 2016 Nov;23(6):1077-1084. doi: 10.1093/jamia/ocw006. Epub 2016 Mar 28.

Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.

AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

Machine learning to parse breast pathology reports in Chinese.

Breast Cancer Res Treat. 2018 Jun;169(2):243-250. doi: 10.1007/s10549-018-4668-3. Epub 2018 Jan 29.

Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.

Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22.

Large scale biomedical texts classification: a kNN and an ESA-based approaches.

J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.

Machine learning application for incident prostate adenocarcinomas automatic registration in a French regional cancer registry.

Int J Med Inform. 2020 Jul;139:104139. doi: 10.1016/j.ijmedinf.2020.104139. Epub 2020 Apr 9.

Using machine learning to parse breast pathology reports.

Breast Cancer Res Treat. 2017 Jan;161(2):203-211. doi: 10.1007/s10549-016-4035-1. Epub 2016 Nov 8.

引用本文的文献

Prediction model of dental caries in 12-year-old children in Sichuan Province based on machine learning.

Hua Xi Kou Qiang Yi Xue Za Zhi. 2023 Dec 1;41(6):686-693. doi: 10.7518/hxkq.2023.2023124.

Machine learning application identifies plasma markers for proteinuria in metastatic colorectal cancer patients treated with Bevacizumab.

Cancer Chemother Pharmacol. 2024 Jun;93(6):587-593. doi: 10.1007/s00280-024-04655-7. Epub 2024 Feb 25.

Cohort Identification from Free-Text Clinical Notes Using SNOMED CT's Hierarchical Semantic Relations.

AMIA Annu Symp Proc. 2023 Apr 29;2022:349-358. eCollection 2022.

Automatic Classification of Cancer Pathology Reports: A Systematic Review.

J Pathol Inform. 2022 Jan 20;13:100003. doi: 10.1016/j.jpi.2022.100003. eCollection 2022.

A Deep Neural Network for Early Detection and Prediction of Chronic Kidney Disease.

Diagnostics (Basel). 2022 Jan 5;12(1):116. doi: 10.3390/diagnostics12010116.

Prediction Models of Early Childhood Caries Based on Machine Learning Algorithms.

Int J Environ Res Public Health. 2021 Aug 15;18(16):8613. doi: 10.3390/ijerph18168613.

Artificial intelligence (AI) in medicine, current applications and future role with special emphasis on its potential and promise in pathology: present and future impact, obstacles including costs and acceptance among pathologists, practical and philosophical considerations. A comprehensive review.

Diagn Pathol. 2021 Mar 17;16(1):24. doi: 10.1186/s13000-021-01085-4.

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis.

J Pers Med. 2020 Mar 31;10(2):21. doi: 10.3390/jpm10020021.

Clinical Text Data in Machine Learning: Systematic Review.

JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics.

JCO Clin Cancer Inform. 2019 Aug;3:1-8. doi: 10.1200/CCI.19.00008.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于信息提取降噪的手术病理报告的机器学习分类及语块识别

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.

作者信息

Napolitano Giulio, Marshall Adele, Hamilton Peter, Gavin Anna T

机构信息

Institut für Medizinische Biometrie, Informatik und Epidemiologie (IMBIE), Universität Bonn, Haus 325/11/1.OG/Raum 620, Sigmund-Freud-Straße 25, 53105 Bonn, Germany.

Queen's University Belfast, School of Mathematics and Physics, University Road, Belfast BT7 1NN, United Kingdom.

出版信息

Artif Intell Med. 2016 Jun;70:77-83. doi: 10.1016/j.artmed.2016.06.001. Epub 2016 Jun 8.

DOI:10.1016/j.artmed.2016.06.001

PMID:27431038

Abstract

BACKGROUND AND AIMS

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

摘要

背景与目的

材料与方法

结果

结论

这些结果表明，预测报告的布局是可行且有益的，并且报告中哪些部分可能包含特定信息的预测准确性对报告布局和所寻求信息的类型敏感。

用于信息提取降噪的手术病理报告的机器学习分类及语块识别

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.

作者信息

机构信息

出版信息

BACKGROUND AND AIMS

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景与目的

材料与方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于信息提取降噪的手术病理报告的机器学习分类及语块识别

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.

作者信息

机构信息

出版信息

BACKGROUND AND AIMS

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景与目的

材料与方法

结果

结论

相似文献

引用本文的文献