Suppr超能文献

在深度学习的数字病理图像分类中,自动标注与手动标注同样有效。

Automatic labels are as effective as manual labels in digital pathology images classification with deep learning.

作者信息

Marini Niccolo, Marchesin Stefano, Ferris Lluis Borras, Püttmann Simon, Wodzinski Marek, Fratti Riccardo, Podareanu Damian, Caputo Alessandro, Boytcheva Svetla, Vatrano Simona, Fraggetta Filippo, Nagtegaal Iris, Silvello Gianmaria, Atzori Manfredo, Müller Henning

机构信息

Information Systems Institute, University of Applied Sciences Western Switzerland (HES-SO Valais), Sierre, Switzerland.

Department of Information Engineering, University of Padua, Padua, Italy.

出版信息

J Pathol Inform. 2025 Jul 22;18:100462. doi: 10.1016/j.jpi.2025.100462. eCollection 2025 Aug.

Abstract

The increasing availability of biomedical data is helping to design more robust deep learning (DL) algorithms to analyze biomedical samples. Currently, one of the main limitations to training DL algorithms to perform a specific task is the need for medical experts to manually label the data. Automatic methods to label data exist; however, automatic labels can be noisy, and it is not completely clear in which situations they can be used to train DL models. This paper aims to investigate under which circumstances automatic labels can be used to train a DL model for the classification of whole slide images. The analysis involves multiple architectures, such as convolutional neural networks and vision transformer, and 10,604 WSIs as training data, collected from three use cases: celiac disease, lung cancer, and colon cancer, which include respectively binary, multiclass, and multilabel data. The results identify 10% as the percentage of noisy labels before a performance drop-off, so to train effective models for the classification of WSIs, reaching, respectively, F1-scores of 0.906, 0.757, and 0.833. Therefore, an algorithm generating automatic labels needs to stay within this range to be adopted, as shown by the application of Semantic Knowledge Extractor Tool as a tool to automatically extract concepts and use them as labels. Automatic labels are as effective as manual labels in this case, achieving solid performance comparable to that obtained by training models with manual labels.

摘要

生物医学数据可用性的不断提高有助于设计更强大的深度学习(DL)算法来分析生物医学样本。目前,训练DL算法执行特定任务的主要限制之一是需要医学专家手动标记数据。存在自动标记数据的方法;然而,自动标记可能存在噪声,并且在哪些情况下可以用于训练DL模型尚不完全清楚。本文旨在研究在哪些情况下自动标记可用于训练DL模型以对全切片图像进行分类。分析涉及多种架构,如卷积神经网络和视觉Transformer,并将10,604张全切片图像作为训练数据,这些数据来自三个用例:乳糜泻、肺癌和结肠癌,分别包括二分类、多分类和多标签数据。结果表明,在性能下降之前,噪声标签的比例为10%,因此为了训练有效的全切片图像分类模型,分别达到了0.906、0.757和0.833的F1分数。因此,生成自动标签的算法需要保持在这个范围内才能被采用,如语义知识提取工具作为自动提取概念并将其用作标签的工具的应用所示。在这种情况下,自动标签与手动标签一样有效,实现了与使用手动标签训练模型相当的可靠性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4131/12391760/d24bac97fc76/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验