展示全幻灯片图像文件Babel fish的框架：一种基于光学字符识别的文件标注工具。

Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool.

作者信息

Englert Nils, Schwab Constantin, Legnar Maximilian, Weis Cleo-Aron

机构信息

Section Computational Pathology Heidelberg, Institute of Pathology Heidelberg, University Hospital Heidelberg, University of Heidelberg, Heidelberg, Germany.

Institute of Pathology Heidelberg, University Hospital Heidelberg, University of Heidelberg, Heidelberg, Germany.

出版信息

J Pathol Inform. 2024 Oct 23;15:100402. doi: 10.1016/j.jpi.2024.100402. eCollection 2024 Dec.

DOI:10.1016/j.jpi.2024.100402

PMID:39634381

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11616518/

Abstract

INTRODUCTION

Metadata extraction from digitized slides or whole slide image files is a frequent, laborious, and tedious task. In this work, we present a tool to automatically extract all relevant slide information, such as case number, year, slide number, block number, and staining from the macro-images of the scanned slide.We named the tool Babel fish as it helps translate relevant information printed on the slide. It is written to contain certain basic assumptions regarding, for example, the location of certain information. This can be adapted to the respective location. The extracted metadata can then be used to sort digital slides into databases or to link them with associated case IDs from laboratory information systems.

MATERIAL AND METHODS

The tool is based on optical character recognition (OCR). For most information, the easyOCR tool is used. For the block number and cases with insufficient results in the first OCR round, a second OCR with pytesseract is applied.Two datasets are used: one for tool development has 342 slides; and another for one for testing has 110 slides.

RESULTS

For the testing set, the overall accuracy for retrieving all relevant information per slide is 0.982. Of note, the accuracy for most information parts is 1.000, whereas the accuracy for the block number detection is 0.982.

CONCLUSION

The Babel fish tool can be used to rename vast amounts of whole slide image files in an image analysis pipeline. Furthermore, it could be an essential part of DICOM conversion pipelines, as it extracts relevant metadata like case number, year, block ID, and staining.

摘要

引言

从数字化切片或全切片图像文件中提取元数据是一项频繁、费力且繁琐的任务。在本研究中，我们展示了一种工具，可从扫描切片的宏观图像中自动提取所有相关切片信息，如病例编号、年份、切片编号、组织块编号和染色信息。我们将该工具命名为“巴别鱼”，因为它有助于翻译印在切片上的相关信息。它的编写基于某些基本假设，例如某些信息的位置。这可以根据各自的位置进行调整。然后，提取的元数据可用于将数字切片分类到数据库中，或将它们与实验室信息系统中的相关病例ID进行链接。

材料与方法

该工具基于光学字符识别（OCR）。对于大多数信息，使用easyOCR工具。对于组织块编号以及在第一轮OCR中结果不足的病例，应用pytesseract进行第二轮OCR。使用了两个数据集：一个用于工具开发，有342张切片；另一个用于测试，有110张切片。

结果

对于测试集，每张切片检索所有相关信息的总体准确率为0.982。值得注意的是，大多数信息部分的准确率为1.000，而组织块编号检测的准确率为0.982。

结论

“巴别鱼”工具可用于在图像分析流程中重命名大量全切片图像文件。此外，它可能是DICOM转换流程的重要组成部分，因为它可以提取病例编号、年份、组织块ID和染色等相关元数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d98/11616518/2b88e00d3ec9/gr1.jpg

相似文献

Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool.展示全幻灯片图像文件Babel fish的框架：一种基于光学字符识别的文件标注工具。

J Pathol Inform. 2024 Oct 23;15:100402. doi: 10.1016/j.jpi.2024.100402. eCollection 2024 Dec.

Whole slide imaging equivalency and efficiency study: experience at a large academic center.全 slides 成像等效性和效率研究：大型学术中心的经验。

Mod Pathol. 2019 Jul;32(7):916-928. doi: 10.1038/s41379-019-0205-0. Epub 2019 Feb 18.

A data model and database for high-resolution pathology analytical image informatics.用于高分辨率病理学分析图像信息学的数据模型与数据库。

J Pathol Inform. 2011;2:32. doi: 10.4103/2153-3539.83192. Epub 2011 Jul 26.

Implementing the DICOM Standard for Digital Pathology.实施数字病理学的DICOM标准。

J Pathol Inform. 2018 Nov 2;9:37. doi: 10.4103/jpi.jpi_42_18. eCollection 2018.

Evaluation of panoramic digital images using Panoptiq for frozen section diagnosis.使用Panoptiq对全景数字图像进行评估以用于冰冻切片诊断。

J Pathol Inform. 2016 May 4;7:26. doi: 10.4103/2153-3539.181770. eCollection 2016.

What is Essential is (No More) Invisible to the Eyes: The Introduction of BlocDoc in the Digital Pathology Workflow.至关重要之物（不再）肉眼难见：BlocDoc 在数字病理工作流程中的引入

J Pathol Inform. 2021 Sep 16;12:32. doi: 10.4103/jpi.jpi_35_21. eCollection 2021.

Dual-Personality DICOM-TIFF for Whole Slide Images: A Migration Technique for Legacy Software.用于全切片图像的双人格DICOM-TIFF：一种遗留软件的迁移技术。

J Pathol Inform. 2019 Apr 3;10:12. doi: 10.4103/jpi.jpi_93_18. eCollection 2019.

Applications and challenges of digital pathology and whole slide imaging.数字病理学与全切片成像的应用及挑战

Biotech Histochem. 2015 Jul;90(5):341-7. doi: 10.3109/10520295.2015.1044566. Epub 2015 May 15.

Application of whole slide image markup and annotation for pathologist knowledge capture.全玻片图像标记和注释在病理学家知识获取中的应用。

J Pathol Inform. 2013 Feb 28;4:2. doi: 10.4103/2153-3539.107953. Print 2013.

Comparison of glass slides and various digital-slide modalities for cytopathology screening and interpretation.用于细胞病理学筛查和判读的玻片与各种数字玻片模式的比较。

Cancer Cytopathol. 2017 Sep;125(9):701-709. doi: 10.1002/cncy.21880. Epub 2017 May 30.

本文引用的文献

DICOM Format and Protocol Standardization-A Core Requirement for Digital Pathology Success.DICOM 格式和协议标准化——数字病理学成功的核心要求。

Toxicol Pathol. 2021 Jun;49(4):738-749. doi: 10.1177/0192623320965893. Epub 2020 Oct 16.

Implementing the DICOM Standard for Digital Pathology.实施数字病理学的DICOM标准。

J Pathol Inform. 2018 Nov 2;9:37. doi: 10.4103/jpi.jpi_42_18. eCollection 2018.

OpenSlide: A vendor-neutral software foundation for digital pathology.OpenSlide：一个用于数字病理学的供应商中立软件基础。

J Pathol Inform. 2013 Sep 27;4:27. doi: 10.4103/2153-3539.119005. eCollection 2013.

Standardizing the use of whole slide images in digital pathology.规范数字病理学中全切片图像的使用。

Comput Med Imaging Graph. 2011 Oct-Dec;35(7-8):496-505. doi: 10.1016/j.compmedimag.2010.12.004. Epub 2011 Jan 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

展示全幻灯片图像文件Babel fish的框架：一种基于光学字符识别的文件标注工具。

Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool.

作者信息

机构信息

出版信息

INTRODUCTION

MATERIAL AND METHODS

RESULTS

CONCLUSION

引言

材料与方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献