School of Automation, Southeast University, 2nd Sipailou Road, Nanjing, China.
Comput Methods Programs Biomed. 2021 Nov;212:106464. doi: 10.1016/j.cmpb.2021.106464. Epub 2021 Oct 13.
Recognizing different tissue components is one of the most fundamental and essential works in digital pathology. Current methods are often based on convolutional neural networks (CNNs), which need numerous annotated samples for training. Creating large-scale histopathological datasets is labor-intensive, where interactive data annotation is a potential solution.
We propose DELR (Deep Embedding-based Logistic Regression) to enable rapid model training and inference for histopathological image analysis. DELR utilizes a pretrained CNN to encode images as compact embeddings with low computational cost. The embeddings are then used to train a Logistic Regression model efficiently. We implemented DELR in an active learning framework, and validated it on three histopathological problems (binary, 4-category, and 8-category classification challenge for lung, breast, and colorectal cancer, respectively). We also investigated the influence of active learning strategy and type of the encoder.
On all the three datasets, DELR can achieve an area under curve (AUC) metric higher than 0.95 with only 100 image patches per class. Although its AUC is slightly lower than a fine-tuned CNN counterpart, DELR can be 536, 316, and 1481 times faster after pre-encoding. Moreover, DELR is proved to be compatible with a variety of active learning strategies and encoders.
DELR can achieve comparable accuracy to CNN with rapid running speed. These advantages make it a potential solution for real-time interactive data annotation.
识别不同的组织成分是数字病理学中最基础和关键的工作之一。目前的方法通常基于卷积神经网络(CNN),这需要大量的标注样本进行训练。创建大规模的组织病理学数据集是一项劳动密集型的工作,交互式数据标注是一种潜在的解决方案。
我们提出了 DELR(基于深度嵌入的逻辑回归),以实现对组织病理学图像分析的快速模型训练和推断。DELR 利用预训练的 CNN 将图像编码为紧凑的嵌入,计算成本低。然后,使用这些嵌入来高效地训练逻辑回归模型。我们在主动学习框架中实现了 DELR,并在三个组织病理学问题(分别为肺、乳腺和结直肠癌的二分类、四分类和八分类分类挑战)上对其进行了验证。我们还研究了主动学习策略和编码器类型的影响。
在所有三个数据集上,DELR 仅使用每个类别 100 个图像块,即可实现 AUC 指标高于 0.95。尽管其 AUC 略低于微调后的 CNN 对应值,但在预编码后,DELR 的速度可以提高 536、316 和 1481 倍。此外,DELR 被证明与多种主动学习策略和编码器兼容。
DELR 可以实现与 CNN 相当的准确性,同时具有快速的运行速度。这些优势使其成为实时交互式数据标注的潜在解决方案。