• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Risks of feature leakage and sample size dependencies in deep feature extraction for breast mass classification.深度特征提取在乳腺肿块分类中特征泄露和样本大小依赖的风险。
Med Phys. 2021 Jun;48(6):2827-2837. doi: 10.1002/mp.14678. Epub 2021 Apr 12.
2
Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography.数字乳腺断层合成中的肿块检测:基于乳腺X线摄影迁移学习的深度卷积神经网络
Med Phys. 2016 Dec;43(12):6654. doi: 10.1118/1.4967345.
3
Generalization error analysis for deep convolutional neural network with transfer learning in breast cancer diagnosis.基于迁移学习的深度卷积神经网络在乳腺癌诊断中的泛化误差分析。
Phys Med Biol. 2020 May 11;65(10):105002. doi: 10.1088/1361-6560/ab82e8.
4
Automated pectoral muscle identification on MLO-view mammograms: Comparison of deep neural network to conventional computer vision.基于 MLO 视图的乳腺钼靶片中自动胸大肌识别:深度神经网络与传统计算机视觉的比较。
Med Phys. 2019 May;46(5):2103-2114. doi: 10.1002/mp.13451. Epub 2019 Mar 12.
5
Digital breast tomosynthesis versus digital mammography: integration of image modalities enhances deep learning-based breast mass classification.数字乳腺断层合成与数字乳腺钼靶摄影:图像模式的整合增强了基于深度学习的乳腺肿块分类。
Eur Radiol. 2020 Feb;30(2):778-788. doi: 10.1007/s00330-019-06457-5. Epub 2019 Nov 5.
6
Effect of finite sample size on feature selection and classification: a simulation study.有限样本大小对特征选择和分类的影响:一项模拟研究。
Med Phys. 2010 Feb;37(2):907-20. doi: 10.1118/1.3284974.
7
A framework for breast cancer classification using Multi-DCNNs.基于多 DCNN 的乳腺癌分类框架。
Comput Biol Med. 2021 Apr;131:104245. doi: 10.1016/j.compbiomed.2021.104245. Epub 2021 Jan 29.
8
Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets.数字乳腺断层合成中的乳腺癌诊断:使用深度神经网络的多阶段迁移学习对训练样本大小的影响。
IEEE Trans Med Imaging. 2019 Mar;38(3):686-696. doi: 10.1109/TMI.2018.2870343.
9
Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis.基于数字乳腺断层合成的转移学习深度卷积神经网络的进化剪枝用于乳腺癌诊断。
Phys Med Biol. 2018 May 1;63(9):095005. doi: 10.1088/1361-6560/aabb5b.
10
Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms.多任务迁移学习深度卷积神经网络:在乳腺 X 光片中应用于乳腺癌的计算机辅助诊断。
Phys Med Biol. 2017 Nov 10;62(23):8894-8908. doi: 10.1088/1361-6560/aa93d4.

引用本文的文献

1
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets.探究DermaMNIST和Fitzpatrick17k皮肤病学图像数据集的质量
Sci Data. 2025 Feb 1;12(1):196. doi: 10.1038/s41597-025-04382-5.
2
A Short Breast Imaging Reporting and Data System-Based Description for Classification of Breast Mass Grade.基于乳腺影像报告和数据系统的乳腺肿块分级简短描述
Life (Basel). 2024 Dec 9;14(12):1634. doi: 10.3390/life14121634.
3
Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature.乳腺钼靶摄影中深度学习的可重复性与可解释性:文献系统综述
Indian J Radiol Imaging. 2023 Oct 10;34(3):469-487. doi: 10.1055/s-0043-1775737. eCollection 2024 Jul.
4
Machine learning and new insights for breast cancer diagnosis.用于乳腺癌诊断的机器学习与新见解
J Int Med Res. 2024 Apr;52(4):3000605241237867. doi: 10.1177/03000605241237867.
5
Detection of Severe Lung Infection on Chest Radiographs of COVID-19 Patients: Robustness of AI Models across Multi-Institutional Data.新型冠状病毒肺炎患者胸部X线片上严重肺部感染的检测:人工智能模型在多机构数据中的稳健性
Diagnostics (Basel). 2024 Feb 5;14(3):341. doi: 10.3390/diagnostics14030341.
6
The effect of data resampling methods in radiomics.数据重采样方法在放射组学中的影响。
Sci Rep. 2024 Feb 3;14(1):2858. doi: 10.1038/s41598-024-53491-5.
7
Artificial intelligence for detecting temporomandibular joint osteoarthritis using radiographic image data: A systematic review and meta-analysis of diagnostic test accuracy.利用放射影像数据检测颞下颌关节骨关节炎的人工智能:诊断试验准确性的系统评价和荟萃分析
PLoS One. 2023 Jul 14;18(7):e0288631. doi: 10.1371/journal.pone.0288631. eCollection 2023.
8
Feasibility of Bone Mineral Density and Bone Microarchitecture Assessment Using Deep Learning With a Convolutional Neural Network.基于卷积神经网络的深度学习在骨密度和骨微结构评估中的可行性研究。
J Comput Assist Tomogr. 2023;47(3):467-474. doi: 10.1097/RCT.0000000000001437.
9
Relationship between the deep features of the full-scan pathological map of mucinous gastric carcinoma and related genes based on deep learning.基于深度学习的黏液性胃癌全扫描病理图像深度特征与相关基因的关系
Heliyon. 2023 Mar 9;9(3):e14374. doi: 10.1016/j.heliyon.2023.e14374. eCollection 2023 Mar.
10
Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison.机器学习在医学中的应用:数据预处理、超参数调优和模型比较技术的实用介绍。
BMC Med Res Methodol. 2022 Nov 1;22(1):282. doi: 10.1186/s12874-022-01758-8.

本文引用的文献

1
Computer-aided diagnosis in the era of deep learning.深度学习时代的计算机辅助诊断。
Med Phys. 2020 Jun;47(5):e218-e227. doi: 10.1002/mp.13764.
2
Generalization error analysis for deep convolutional neural network with transfer learning in breast cancer diagnosis.基于迁移学习的深度卷积神经网络在乳腺癌诊断中的泛化误差分析。
Phys Med Biol. 2020 May 11;65(10):105002. doi: 10.1088/1361-6560/ab82e8.
3
CAD and AI for breast cancer-recent development and challenges.CAD 和 AI 在乳腺癌中的应用——最新进展与挑战。
Br J Radiol. 2020 Apr;93(1108):20190580. doi: 10.1259/bjr.20190580. Epub 2019 Dec 16.
4
Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets.数字乳腺断层合成中的乳腺癌诊断:使用深度神经网络的多阶段迁移学习对训练样本大小的影响。
IEEE Trans Med Imaging. 2019 Mar;38(3):686-696. doi: 10.1109/TMI.2018.2870343.
5
A curated mammography data set for use in computer-aided detection and diagnosis research.用于计算机辅助检测和诊断研究的精选 mammography 数据集。
Sci Data. 2017 Dec 19;4:170177. doi: 10.1038/sdata.2017.177.
6
Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms.多任务迁移学习深度卷积神经网络:在乳腺 X 光片中应用于乳腺癌的计算机辅助诊断。
Phys Med Biol. 2017 Nov 10;62(23):8894-8908. doi: 10.1088/1361-6560/aa93d4.
7
A Deep Learning-Based Radiomics Model for Prediction of Survival in Glioblastoma Multiforme.基于深度学习的胶质母细胞瘤生存预测放射组学模型。
Sci Rep. 2017 Sep 4;7(1):10353. doi: 10.1038/s41598-017-10649-8.
8
An Ensemble of Fine-Tuned Convolutional Neural Networks for Medical Image Classification.用于医学图像分类的微调卷积神经网络集成
IEEE J Biomed Health Inform. 2017 Jan;21(1):31-40. doi: 10.1109/JBHI.2016.2635663. Epub 2016 Dec 5.
9
Deep Feature Transfer Learning in Combination with Traditional Features Predicts Survival Among Patients with Lung Adenocarcinoma.深度特征迁移学习结合传统特征可预测肺腺癌患者的生存率。
Tomography. 2016 Dec;2(4):388-395. doi: 10.18383/j.tom.2016.00211.
10
Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography.数字乳腺断层合成中的肿块检测:基于乳腺X线摄影迁移学习的深度卷积神经网络
Med Phys. 2016 Dec;43(12):6654. doi: 10.1118/1.4967345.

深度特征提取在乳腺肿块分类中特征泄露和样本大小依赖的风险。

Risks of feature leakage and sample size dependencies in deep feature extraction for breast mass classification.

机构信息

Department of Radiology, University of Michigan, Ann Arbor, MI, USA.

出版信息

Med Phys. 2021 Jun;48(6):2827-2837. doi: 10.1002/mp.14678. Epub 2021 Apr 12.

DOI:10.1002/mp.14678
PMID:33368376
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8601676/
Abstract

PURPOSE

Transfer learning is commonly used in deep learning for medical imaging to alleviate the problem of limited available data. In this work, we studied the risk of feature leakage and its dependence on sample size when using pretrained deep convolutional neural network (DCNN) as feature extractor for classification breast masses in mammography.

METHODS

Feature leakage occurs when the training set is used for feature selection and classifier modeling while the cost function is guided by the validation performance or informed by the test performance. The high-dimensional feature space extracted from pretrained DCNN suffers from the curse of dimensionality; feature subsets that can provide excessively optimistic performance can be found for the validation set or test set if the latter is allowed for unlimited reuse during algorithm development. We designed a simulation study to examine feature leakage when using DCNN as feature extractor for mass classification in mammography. Four thousand five hundred and seventy-seven unique mass lesions were partitioned by patient into three sets: 3222 for training, 508 for validation, and 847 for independent testing. Three pretrained DCNNs, AlexNet, GoogLeNet, and VGG16, were first compared using a training set in fourfold cross validation and one was selected as the feature extractor. To assess generalization errors, the independent test set was sequestered as truly unseen cases. A training set of a range of sizes from 10% to 75% was simulated by random drawing from the available training set in addition to 100% of the training set. Three commonly used feature classifiers, the linear discriminant, the support vector machine, and the random forest were evaluated. A sequential feature selection method was used to find feature subsets that could achieve high classification performance in terms of the area under the receiver operating characteristic curve (AUC) in the validation set. The extent of feature leakage and the impact of training set size were analyzed by comparison to the performance in the unseen test set.

RESULTS

All three classifiers showed large generalization error between the validation set and the independent sequestered test set at all sample sizes. The generalization error decreased as the sample size increased. At 100% of the sample size, one classifier achieved an AUC as high as 0.91 on the validation set while the corresponding performance on the unseen test set only reached an AUC of 0.72.

CONCLUSIONS

Our results demonstrate that large generalization errors can occur in AI tools due to feature leakage. Without evaluation on unseen test cases, optimistically biased performance may be reported inadvertently, and can lead to unrealistic expectations and reduce confidence for clinical implementation.

摘要

目的

在医学成像领域的深度学习中,迁移学习常用于缓解可用数据有限的问题。本研究旨在研究使用预先训练的深度卷积神经网络(DCNN)作为特征提取器对乳腺肿块进行分类时,特征泄露的风险及其对样本量的依赖性。

方法

当训练集用于特征选择和分类器建模,而成本函数由验证性能指导或由测试性能告知时,就会发生特征泄露。从预先训练的 DCNN 中提取的高维特征空间存在维度灾难;如果允许在算法开发过程中无限次重复使用测试集,则可以为验证集或测试集找到提供过于乐观性能的特征子集。我们设计了一项模拟研究,以检查在使用 DCNN 作为特征提取器对乳腺钼靶片中的肿块进行分类时特征泄露的情况。将 4577 个独特的肿块病例按患者分为三组:3222 个用于训练,508 个用于验证,847 个用于独立测试。首先,在 4 倍交叉验证中比较了 AlexNet、GoogLeNet 和 VGG16 这三个预先训练的 DCNN,并选择了其中一个作为特征提取器。为了评估泛化误差,将独立的测试集作为真正未见过的病例。通过从可用的训练集中随机抽取,模拟了从 10%到 75%的训练集大小范围,此外还模拟了 100%的训练集。评估了三种常用的特征分类器,即线性判别分析、支持向量机和随机森林。使用顺序特征选择方法找到特征子集,以在验证集的接收者操作特征曲线(AUC)下获得高分类性能。通过与未见测试集的性能比较,分析了特征泄露的程度和训练集大小的影响。

结果

在所有样本大小下,三种分类器在验证集和独立的隔离测试集之间都表现出较大的泛化误差。随着样本量的增加,泛化误差逐渐减小。在 100%的样本量下,一个分类器在验证集上的 AUC 高达 0.91,而相应的未见测试集上的 AUC 仅为 0.72。

结论

我们的结果表明,由于特征泄露,人工智能工具可能会出现较大的泛化误差。如果不在未见测试案例上进行评估,可能会无意中报告乐观偏差的性能,从而导致不切实际的期望并降低临床实施的信心。