Suppr超能文献

光学字符识别(OCR)准确性对病理报告自动癌症分类的影响。

The impact of OCR accuracy on automated cancer classification of pathology reports.

作者信息

Zuccon Guido, Nguyen Anthony N, Bergheim Anton, Wickman Sandra, Grayson Narelle

机构信息

The Australian e-Health Research Centre, CSIRO ICT Centre, Brisbane, Australia.

出版信息

Stud Health Technol Inform. 2012;178:250-6.

Abstract

OBJECTIVE

To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports.

METHOD

Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports.

RESULTS

The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc.

CONCLUSIONS

The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.

摘要

目的

评估光学字符识别(OCR)对病理报告自动癌症分类的影响。

方法

使用商业OCR系统将病理报告的扫描图像转换为电子自由文本。采用一种先进的癌症分类系统——医学文本提取(MEDTEX)系统对OCR报告进行自动分类。将MEDTEX对报告OCR版本的分类结果与OCR报告人工修正版本的分类结果进行比较。

结果

发现所使用的OCR系统识别扫描病理报告的字符准确率高达99.12%,单词准确率高达98.95%。发现OCR处理中的错误对将扫描病理报告自动分类到应报告组的影响最小。然而,在考虑提取癌症报告项目(如原发部位、组织学类型等)时,OCR错误的影响不可忽略。

结论

本研究中使用的自动癌症分类系统MEDTEX已被证明对通过OCR软件从扫描图像中获取自由文本病理报告所产生的错误具有鲁棒性。然而,在考虑提取癌症报告项目时会出现问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验