Suppr超能文献

少量病例可能产生可推广的模型,这是深度学习用于结肠组织学的概念验证。

Limited Number of Cases May Yield Generalizable Models, a Proof of Concept in Deep Learning for Colon Histology.

作者信息

Holland Lorne, Wei Dongguang, Olson Kristin A, Mitra Anupam, Graff John Paul, Jones Andrew D, Durbin-Johnson Blythe, Mitra Ananya Datta, Rashidi Hooman H

机构信息

Department of Pathology and Laboratory Medicine, University of California, Sacramento, CA, USA.

Division of Biostatistics, UC Davis Genome Center, Genome and Biomedical Sciences Facility, University of California, Davis, CA, USA.

出版信息

J Pathol Inform. 2020 Feb 21;11:5. doi: 10.4103/jpi.jpi_49_19. eCollection 2020.

Abstract

BACKGROUND

Little is known about the effect of a minimum number of slides required in generating image datasets used to build generalizable machine-learning (ML) models. In addition, the assumption within deep learning is that the increased number of training images will always enhance accuracy and that the initial validation accuracy of the models correlates well with their generalizability. In this pilot study, we have been able to test the above assumptions to gain a better understanding of such platforms, especially when data resources are limited.

METHODS

Using 10 colon histology slides (5 carcinoma and 5 benign), we were able to acquire 1000 partially overlapping images (Dataset A) that were then trained and tested on three convolutional neural networks (CNNs), ResNet50, AlexNet, and SqueezeNet, to build a large number of unique models for a simple task of classifying colon histopathology into benign and malignant. Different quantities of images (10-1000) from Dataset A were used to construct >200 unique CNN models whose performances were individually assessed. The performance of these models was initially assessed using 20% of Dataset A's images (not included in the training phase) to acquire their initial validation accuracy (internal accuracy) followed by their generalization accuracy on Dataset B (a very distinct secondary test set acquired from public domain online sources).

RESULTS

All CNNs showed similar peak internal accuracies (>97%) from the Dataset A test set. Peak accuracies for the external novel test set (Dataset B), an assessment of the ability to generalize, showed marked variation (ResNet50: 98%; AlexNet: 92%; and SqueezeNet: 80%). The models with the highest accuracy were not generated using the largest training sets. Further, a model's internal accuracy did not always correlate with its generalization accuracy. The results were obtained using an optimized number of cases and controls.

CONCLUSIONS

Increasing the number of images in a training set does not always improve model accuracy, and significant numbers of cases may not always be needed for generalization, especially for simple tasks. Different CNNs reach peak accuracy with different training set sizes. Further studies are required to evaluate the above findings in more complex ML models prior to using such ancillary tools in clinical settings.

摘要

背景

关于用于构建可推广机器学习(ML)模型的图像数据集所需的最少幻灯片数量的影响,人们了解甚少。此外,深度学习中的假设是,训练图像数量的增加总会提高准确性,并且模型的初始验证准确性与其可推广性密切相关。在这项试点研究中,我们能够测试上述假设,以便更好地理解此类平台,尤其是在数据资源有限的情况下。

方法

使用10张结肠组织学幻灯片(5张癌组织和5张良性组织),我们能够获取1000张部分重叠的图像(数据集A),然后在三个卷积神经网络(CNN),即ResNet50、AlexNet和SqueezeNet上进行训练和测试,以构建大量用于将结肠组织病理学简单分类为良性和恶性的独特模型。使用来自数据集A的不同数量的图像(10 - 1000张)构建了200多个独特的CNN模型,并分别评估其性能。这些模型的性能最初使用数据集A中20%的图像(不包括在训练阶段)进行评估,以获得其初始验证准确性(内部准确性),随后在数据集B上评估其泛化准确性(从公共领域在线来源获取的一个非常不同的二级测试集)。

结果

所有CNN在数据集A测试集中均显示出相似的峰值内部准确性(>97%)。外部新测试集(数据集B)的峰值准确性,即对泛化能力的评估,显示出显著差异(ResNet50:98%;AlexNet:92%;SqueezeNet:80%)。准确性最高的模型并非使用最大的训练集生成。此外,模型的内部准确性并不总是与其泛化准确性相关。这些结果是使用优化数量的病例和对照获得的。

结论

增加训练集中的图像数量并不总是能提高模型准确性,并且泛化可能并不总是需要大量病例,尤其是对于简单任务。不同的CNN在不同的训练集大小下达到峰值准确性。在临床环境中使用此类辅助工具之前,需要进一步研究以在更复杂的ML模型中评估上述发现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验