Suppr超能文献

数十张图像就足以训练神经网络来检测恶性白细胞。

Tens of images can suffice to train neural networks for malignant leukocyte detection.

机构信息

Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.

Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany.

出版信息

Sci Rep. 2021 Apr 12;11(1):7995. doi: 10.1038/s41598-021-86995-5.

Abstract

Convolutional neural networks (CNNs) excel as powerful tools for biomedical image classification. It is commonly assumed that training CNNs requires large amounts of annotated data. This is a bottleneck in many medical applications where annotation relies on expert knowledge. Here, we analyze the binary classification performance of a CNN on two independent cytomorphology datasets as a function of training set size. Specifically, we train a sequential model to discriminate non-malignant leukocytes from blast cells, whose appearance in the peripheral blood is a hallmark of leukemia. We systematically vary training set size, finding that tens of training images suffice for a binary classification with an ROC-AUC over 90%. Saliency maps and layer-wise relevance propagation visualizations suggest that the network learns to increasingly focus on nuclear structures of leukocytes as the number of training images is increased. A low dimensional tSNE representation reveals that while the two classes are separated already for a few training images, the distinction between the classes becomes clearer when more training images are used. To evaluate the performance in a multi-class problem, we annotated single-cell images from a acute lymphoblastic leukemia dataset into six different hematopoietic classes. Multi-class prediction suggests that also here few single-cell images suffice if differences between morphological classes are large enough. The incorporation of deep learning algorithms into clinical practice has the potential to reduce variability and cost, democratize usage of expertise, and allow for early detection of disease onset and relapse. Our approach evaluates the performance of a deep learning based cytology classifier with respect to size and complexity of the training data and the classification task.

摘要

卷积神经网络(CNN)是生物医学图像分类的强大工具。通常认为,训练 CNN 需要大量的标注数据。这在许多医学应用中是一个瓶颈,因为标注依赖于专业知识。在这里,我们分析了一个 CNN 在两个独立的细胞学数据集上的二分类性能,作为训练集大小的函数。具体来说,我们训练了一个顺序模型来区分非恶性白细胞和原始细胞,原始细胞在外周血中的出现是白血病的一个标志。我们系统地改变训练集大小,发现只需数十张训练图像即可实现 ROC-AUC 超过 90%的二分类。显著图和逐层相关性传播可视化表明,随着训练图像数量的增加,网络学会越来越关注白细胞的核结构。低维 tSNE 表示揭示了尽管对于少数训练图像,两个类别已经分开,但当使用更多的训练图像时,类之间的区别变得更加明显。为了在多类问题中评估性能,我们将急性淋巴细胞白血病数据集的单细胞图像注释为六个不同的造血类。多类预测表明,如果形态类之间的差异足够大,那么几个单细胞图像也足够了。将深度学习算法纳入临床实践具有降低变异性和成本、民主化专业知识的使用以及允许早期发现疾病发作和复发的潜力。我们的方法评估了基于深度学习的细胞学分类器在训练数据的大小和复杂性以及分类任务方面的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c24/8042012/a4b13682c20d/41598_2021_86995_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验