• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

光学字符识别(OCR)准确性对病理报告自动癌症分类的影响。

The impact of OCR accuracy on automated cancer classification of pathology reports.

作者信息

Zuccon Guido, Nguyen Anthony N, Bergheim Anton, Wickman Sandra, Grayson Narelle

机构信息

The Australian e-Health Research Centre, CSIRO ICT Centre, Brisbane, Australia.

出版信息

Stud Health Technol Inform. 2012;178:250-6.

PMID:22797049
Abstract

OBJECTIVE

To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports.

METHOD

Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports.

RESULTS

The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc.

CONCLUSIONS

The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.

摘要

目的

评估光学字符识别(OCR)对病理报告自动癌症分类的影响。

方法

使用商业OCR系统将病理报告的扫描图像转换为电子自由文本。采用一种先进的癌症分类系统——医学文本提取(MEDTEX)系统对OCR报告进行自动分类。将MEDTEX对报告OCR版本的分类结果与OCR报告人工修正版本的分类结果进行比较。

结果

发现所使用的OCR系统识别扫描病理报告的字符准确率高达99.12%,单词准确率高达98.95%。发现OCR处理中的错误对将扫描病理报告自动分类到应报告组的影响最小。然而,在考虑提取癌症报告项目(如原发部位、组织学类型等)时,OCR错误的影响不可忽略。

结论

本研究中使用的自动癌症分类系统MEDTEX已被证明对通过OCR软件从扫描图像中获取自由文本病理报告所产生的错误具有鲁棒性。然而,在考虑提取癌症报告项目时会出现问题。

相似文献

1
The impact of OCR accuracy on automated cancer classification of pathology reports.光学字符识别(OCR)准确性对病理报告自动癌症分类的影响。
Stud Health Technol Inform. 2012;178:250-6.
2
Classification of pathology reports for cancer registry notifications.用于癌症登记通知的病理报告分类。
Stud Health Technol Inform. 2012;178:150-6.
3
Automatic extraction of cancer characteristics from free-text pathology reports for cancer notifications.从用于癌症通报的自由文本病理报告中自动提取癌症特征。
Stud Health Technol Inform. 2011;168:117-24.
4
Automated Cancer Registry Notifications: Validation of a Medical Text Analytics System for Identifying Patients with Cancer from a State-Wide Pathology Repository.自动化癌症登记通知:用于从全州病理库中识别癌症患者的医学文本分析系统的验证
AMIA Annu Symp Proc. 2017 Feb 10;2016:964-973. eCollection 2016.
5
Automatic classification of scanned electronic health record documents.扫描电子健康记录文档的自动分类。
Int J Med Inform. 2020 Dec;144:104302. doi: 10.1016/j.ijmedinf.2020.104302. Epub 2020 Oct 17.
6
Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model.从病理报告中自动提取癌症疾病特征到疾病知识表示模型中。
J Biomed Inform. 2009 Oct;42(5):937-49. doi: 10.1016/j.jbi.2008.12.005. Epub 2008 Dec 27.
7
Facilitating clinical research through automation: Combining optical character recognition with natural language processing.通过自动化促进临床研究:结合光学字符识别和自然语言处理。
Clin Trials. 2022 Oct;19(5):504-511. doi: 10.1177/17407745221093621. Epub 2022 May 24.
8
Design of an automatic coding algorithm for a multi-axial classification in pathology.病理学中多轴分类自动编码算法的设计
Stud Health Technol Inform. 2008;136:815-20.
9
Automated categorisation of clinical incident reports using statistical text classification.使用统计文本分类对临床事件报告进行自动分类。
Qual Saf Health Care. 2010 Dec;19(6):e55. doi: 10.1136/qshc.2009.036657. Epub 2010 Aug 19.
10
Application of optical character recognition with natural language processing for large-scale quality metric data extraction in colonoscopy reports.光学字符识别与自然语言处理在结肠镜报告中大规模质量度量数据提取的应用。
Gastrointest Endosc. 2021 Mar;93(3):750-757. doi: 10.1016/j.gie.2020.08.038. Epub 2020 Sep 3.

引用本文的文献

1
Salience of Medical Concepts of Inside Clinical Texts and Outside Medical Records for Referred Cardiovascular Patients.临床文本中及转诊心血管患者病历之外的医学概念对患者的显著程度
J Healthc Inform Res. 2019 Jan 28;3(2):200-219. doi: 10.1007/s41666-019-00044-5. eCollection 2019 Jun.
2
Generating high-quality data abstractions from scanned clinical records: text-mining-assisted extraction of endometrial carcinoma pathology features as proof of principle.从扫描的临床记录中生成高质量的数据摘要:文本挖掘辅助提取子宫内膜癌病理特征作为原理验证。
BMJ Open. 2020 Jun 11;10(6):e037740. doi: 10.1136/bmjopen-2020-037740.
3
A review of medical terminology standards and structured reporting.
医学术语标准与结构化报告综述。
J Vet Diagn Invest. 2018 Jan;30(1):17-25. doi: 10.1177/1040638717738276. Epub 2017 Oct 15.
4
Classification of cancer-related death certificates using machine learning.使用机器学习对癌症相关死亡证明进行分类。
Australas Med J. 2013 May 30;6(5):292-9. doi: 10.4066/AMJ.2013.1654. Print 2013.