• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

少量病例可能产生可推广的模型,这是深度学习用于结肠组织学的概念验证。

Limited Number of Cases May Yield Generalizable Models, a Proof of Concept in Deep Learning for Colon Histology.

作者信息

Holland Lorne, Wei Dongguang, Olson Kristin A, Mitra Anupam, Graff John Paul, Jones Andrew D, Durbin-Johnson Blythe, Mitra Ananya Datta, Rashidi Hooman H

机构信息

Department of Pathology and Laboratory Medicine, University of California, Sacramento, CA, USA.

Division of Biostatistics, UC Davis Genome Center, Genome and Biomedical Sciences Facility, University of California, Davis, CA, USA.

出版信息

J Pathol Inform. 2020 Feb 21;11:5. doi: 10.4103/jpi.jpi_49_19. eCollection 2020.

DOI:10.4103/jpi.jpi_49_19
PMID:32175170
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7047745/
Abstract

BACKGROUND

Little is known about the effect of a minimum number of slides required in generating image datasets used to build generalizable machine-learning (ML) models. In addition, the assumption within deep learning is that the increased number of training images will always enhance accuracy and that the initial validation accuracy of the models correlates well with their generalizability. In this pilot study, we have been able to test the above assumptions to gain a better understanding of such platforms, especially when data resources are limited.

METHODS

Using 10 colon histology slides (5 carcinoma and 5 benign), we were able to acquire 1000 partially overlapping images (Dataset A) that were then trained and tested on three convolutional neural networks (CNNs), ResNet50, AlexNet, and SqueezeNet, to build a large number of unique models for a simple task of classifying colon histopathology into benign and malignant. Different quantities of images (10-1000) from Dataset A were used to construct >200 unique CNN models whose performances were individually assessed. The performance of these models was initially assessed using 20% of Dataset A's images (not included in the training phase) to acquire their initial validation accuracy (internal accuracy) followed by their generalization accuracy on Dataset B (a very distinct secondary test set acquired from public domain online sources).

RESULTS

All CNNs showed similar peak internal accuracies (>97%) from the Dataset A test set. Peak accuracies for the external novel test set (Dataset B), an assessment of the ability to generalize, showed marked variation (ResNet50: 98%; AlexNet: 92%; and SqueezeNet: 80%). The models with the highest accuracy were not generated using the largest training sets. Further, a model's internal accuracy did not always correlate with its generalization accuracy. The results were obtained using an optimized number of cases and controls.

CONCLUSIONS

Increasing the number of images in a training set does not always improve model accuracy, and significant numbers of cases may not always be needed for generalization, especially for simple tasks. Different CNNs reach peak accuracy with different training set sizes. Further studies are required to evaluate the above findings in more complex ML models prior to using such ancillary tools in clinical settings.

摘要

背景

关于用于构建可推广机器学习(ML)模型的图像数据集所需的最少幻灯片数量的影响,人们了解甚少。此外,深度学习中的假设是,训练图像数量的增加总会提高准确性,并且模型的初始验证准确性与其可推广性密切相关。在这项试点研究中,我们能够测试上述假设,以便更好地理解此类平台,尤其是在数据资源有限的情况下。

方法

使用10张结肠组织学幻灯片(5张癌组织和5张良性组织),我们能够获取1000张部分重叠的图像(数据集A),然后在三个卷积神经网络(CNN),即ResNet50、AlexNet和SqueezeNet上进行训练和测试,以构建大量用于将结肠组织病理学简单分类为良性和恶性的独特模型。使用来自数据集A的不同数量的图像(10 - 1000张)构建了200多个独特的CNN模型,并分别评估其性能。这些模型的性能最初使用数据集A中20%的图像(不包括在训练阶段)进行评估,以获得其初始验证准确性(内部准确性),随后在数据集B上评估其泛化准确性(从公共领域在线来源获取的一个非常不同的二级测试集)。

结果

所有CNN在数据集A测试集中均显示出相似的峰值内部准确性(>97%)。外部新测试集(数据集B)的峰值准确性,即对泛化能力的评估,显示出显著差异(ResNet50:98%;AlexNet:92%;SqueezeNet:80%)。准确性最高的模型并非使用最大的训练集生成。此外,模型的内部准确性并不总是与其泛化准确性相关。这些结果是使用优化数量的病例和对照获得的。

结论

增加训练集中的图像数量并不总是能提高模型准确性,并且泛化可能并不总是需要大量病例,尤其是对于简单任务。不同的CNN在不同的训练集大小下达到峰值准确性。在临床环境中使用此类辅助工具之前,需要进一步研究以在更复杂的ML模型中评估上述发现。

相似文献

1
Limited Number of Cases May Yield Generalizable Models, a Proof of Concept in Deep Learning for Colon Histology.少量病例可能产生可推广的模型,这是深度学习用于结肠组织学的概念验证。
J Pathol Inform. 2020 Feb 21;11:5. doi: 10.4103/jpi.jpi_49_19. eCollection 2020.
2
Impact of pre-analytical variables on deep learning accuracy in histopathology.分析前变量对组织病理学深度学习准确性的影响。
Histopathology. 2019 Jul;75(1):39-53. doi: 10.1111/his.13844. Epub 2019 May 16.
3
Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models.图像数量和图像来源变化对机器学习组织学鉴别诊断模型的影响。
J Pathol Inform. 2021 Jan 23;12:5. doi: 10.4103/jpi.jpi_69_20. eCollection 2021.
4
Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images.应该增强哪些数据子集进行深度学习?一项使用尿路上皮细胞癌组织病理学图像的模拟研究。
BMC Bioinformatics. 2023 Mar 3;24(1):75. doi: 10.1186/s12859-023-05199-y.
5
A deep learning framework for automatic detection of arbitrarily shaped fiducial markers in intrafraction fluoroscopic images.一种用于在分次透视图像中自动检测任意形状基准标记的深度学习框架。
Med Phys. 2019 May;46(5):2286-2297. doi: 10.1002/mp.13519. Epub 2019 Apr 15.
6
Deep learning-based prediction model for diagnosing gastrointestinal diseases using endoscopy images.基于深度学习的内镜图像胃肠道疾病诊断预测模型。
Int J Med Inform. 2023 Sep;177:105142. doi: 10.1016/j.ijmedinf.2023.105142. Epub 2023 Jul 5.
7
Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An experiment on prostate histopathology image classification.基于异构数据和少量局部标注的深度卷积神经网络的半监督学习:前列腺组织病理学图像分类实验。
Med Image Anal. 2021 Oct;73:102165. doi: 10.1016/j.media.2021.102165. Epub 2021 Jul 14.
8
Two-stage deep learning model for fully automated pancreas segmentation on computed tomography: Comparison with intra-reader and inter-reader reliability at full and reduced radiation dose on an external dataset.基于 CT 的全自动胰腺分割的两阶段深度学习模型:在外部数据集上比较全剂量和低剂量下的同读者和异读者可靠性。
Med Phys. 2021 May;48(5):2468-2481. doi: 10.1002/mp.14782. Epub 2021 Mar 16.
9
Study of the Application of Deep Convolutional Neural Networks (CNNs) in Processing Sensor Data and Biomedical Images.深度学习卷积神经网络(CNNs)在传感器数据和生物医学图像处理中的应用研究。
Sensors (Basel). 2019 Aug 17;19(16):3584. doi: 10.3390/s19163584.
10
Feasibility of a generalized convolutional neural network for automated identification of vertebral compression fractures: The Manitoba Bone Mineral Density Registry.基于广义卷积神经网络的椎体压缩性骨折自动识别的可行性:曼尼托巴骨密度登记处研究。
Bone. 2021 Sep;150:116017. doi: 10.1016/j.bone.2021.116017. Epub 2021 May 19.

引用本文的文献

1
Computational pathology: A survey review and the way forward.计算病理学:综述与未来发展方向
J Pathol Inform. 2024 Jan 14;15:100357. doi: 10.1016/j.jpi.2023.100357. eCollection 2024 Dec.
2
Machine Learning for Acute Kidney Injury Prediction in the Intensive Care Unit.机器学习在重症监护病房急性肾损伤预测中的应用。
Adv Chronic Kidney Dis. 2022 Sep;29(5):431-438. doi: 10.1053/j.ackd.2022.06.005.
3
Deep Learning on Histopathological Images for Colorectal Cancer Diagnosis: A Systematic Review.用于结直肠癌诊断的组织病理学图像深度学习:一项系统综述。

本文引用的文献

1
Differentiating Noninvasive Follicular Thyroid Neoplasm with Papillary-Like Nuclear Features from Classic Papillary Thyroid Carcinoma: Analysis of Cytomorphologic Descriptions Using a Novel Machine-Learning Approach.利用新型机器学习方法分析细胞形态学描述,鉴别具有乳头状核特征的非侵袭性滤泡性甲状腺肿瘤与经典乳头状甲状腺癌
J Pathol Inform. 2019 Sep 18;10:29. doi: 10.4103/jpi.jpi_25_19. eCollection 2019.
2
Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.病理学中的人工智能与机器学习:监督方法的现状
Acad Pathol. 2019 Sep 3;6:2374289519873088. doi: 10.1177/2374289519873088. eCollection 2019 Jan-Dec.
3
Diagnostics (Basel). 2022 Mar 29;12(4):837. doi: 10.3390/diagnostics12040837.
4
Automatic Generation of Structured Radiology Reports for Volumetric Computed Tomography Images Using Question-Specific Deep Feature Extraction and Learning.使用特定问题深度特征提取与学习为容积计算机断层扫描图像自动生成结构化放射学报告
J Med Signals Sens. 2021 Jul 21;11(3):194-207. doi: 10.4103/jmss.JMSS_21_20. eCollection 2021 Jul-Sep.
Multi-Field-of-View Deep Learning Model Predicts Nonsmall Cell Lung Cancer Programmed Death-Ligand 1 Status from Whole-Slide Hematoxylin and Eosin Images.
多视野深度学习模型从苏木精和伊红全切片图像预测非小细胞肺癌程序性死亡配体1状态
J Pathol Inform. 2019 Jul 23;10:24. doi: 10.4103/jpi.jpi_24_19. eCollection 2019.
4
Impact of pre-analytical variables on deep learning accuracy in histopathology.分析前变量对组织病理学深度学习准确性的影响。
Histopathology. 2019 Jul;75(1):39-53. doi: 10.1111/his.13844. Epub 2019 May 16.
5
Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning.基于深度学习的非小细胞肺癌组织病理学图像分类和突变预测。
Nat Med. 2018 Oct;24(10):1559-1567. doi: 10.1038/s41591-018-0177-5. Epub 2018 Sep 17.
6
Automatic polyp frame screening using patch based combined feature and dictionary learning.基于补丁的组合特征和字典学习的自动息肉框筛选。
Comput Med Imaging Graph. 2018 Nov;69:33-42. doi: 10.1016/j.compmedimag.2018.08.001. Epub 2018 Aug 22.
7
MuDeRN: Multi-category classification of breast histopathological image using deep residual networks.基于深度残差网络的乳腺组织病理图像多分类方法
Artif Intell Med. 2018 Jun;88:14-24. doi: 10.1016/j.artmed.2018.04.005. Epub 2018 Apr 26.
8
Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.用于检测乳腺癌女性患者淋巴结转移的深度学习算法的诊断评估
JAMA. 2017 Dec 12;318(22):2199-2210. doi: 10.1001/jama.2017.14585.
9
Deep Learning for Classification of Colorectal Polyps on Whole-slide Images.基于全切片图像的深度学习用于结直肠息肉分类
J Pathol Inform. 2017 Jul 25;8:30. doi: 10.4103/jpi.jpi_34_17. eCollection 2017.
10
Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology.基于数字病理切片全场图像的深度学习卷积神经网络用于胃癌的自动分类。
Comput Med Imaging Graph. 2017 Nov;61:2-13. doi: 10.1016/j.compmedimag.2017.06.001. Epub 2017 Jun 16.