深度学习预测乳腺癌转移中过拟合的实证研究

Empirical Study of Overfitting in Deep Learning for Predicting Breast Cancer Metastasis.

作者信息

Xu Chuhan, Coen-Pirani Pablo, Jiang Xia

机构信息

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15217, USA.

出版信息

Cancers (Basel). 2023 Mar 25;15(7):1969. doi: 10.3390/cancers15071969.

DOI:10.3390/cancers15071969

PMID:37046630

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10093528/

Abstract

Overfitting may affect the accuracy of predicting future data because of weakened generalization. In this research, we used an electronic health records (EHR) dataset concerning breast cancer metastasis to study the overfitting of deep feedforward neural networks (FNNs) prediction models. We studied how each hyperparameter and some of the interesting pairs of hyperparameters were interacting to influence the model performance and overfitting. The 11 hyperparameters we studied were activate function, weight initializer, number of hidden layers, learning rate, momentum, decay, dropout rate, batch size, epochs, L1, and L2. Our results show that most of the single hyperparameters are either negatively or positively corrected with model prediction performance and overfitting. In particular, we found that overfitting overall tends to negatively correlate with learning rate, decay, batch size, and L2, but tends to positively correlate with momentum, epochs, and L1. According to our results, learning rate, decay, and batch size may have a more significant impact on both overfitting and prediction performance than most of the other hyperparameters, including L1, L2, and dropout rate, which were designed for minimizing overfitting. We also find some interesting interacting pairs of hyperparameters such as learning rate and momentum, learning rate and decay, and batch size and epochs.

摘要

由于泛化能力减弱，过拟合可能会影响对未来数据的预测准确性。在本研究中，我们使用了一个关于乳腺癌转移的电子健康记录（EHR）数据集来研究深度前馈神经网络（FNN）预测模型的过拟合情况。我们研究了每个超参数以及一些有趣的超参数对是如何相互作用以影响模型性能和过拟合的。我们研究的11个超参数分别是激活函数、权重初始化器、隐藏层数、学习率、动量、衰减、随机失活率、批量大小、轮次、L1和L2。我们的结果表明，大多数单个超参数与模型预测性能和过拟合呈负相关或正相关。特别是，我们发现过拟合总体上往往与学习率、衰减、批量大小和L2呈负相关，但与动量、轮次和L1呈正相关。根据我们的结果，学习率、衰减和批量大小可能比大多数其他超参数（包括为最小化过拟合而设计的L1、L2和随机失活率）对过拟合和预测性能的影响更大。我们还发现了一些有趣的超参数对相互作用，如学习率和动量、学习率和衰减以及批量大小和轮次。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/965d/10093528/82bce9b35101/cancers-15-01969-g001.jpg

相似文献

Empirical Study of Overfitting in Deep Learning for Predicting Breast Cancer Metastasis.

Cancers (Basel). 2023 Mar 25;15(7):1969. doi: 10.3390/cancers15071969.

Exploring Tunable Hyperparameters for Deep Neural Networks with Industrial ADME Data Sets.

J Chem Inf Model. 2019 Mar 25;59(3):1005-1016. doi: 10.1021/acs.jcim.8b00671. Epub 2019 Jan 11.

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data.

J Clin Med. 2022 Sep 29;11(19):5772. doi: 10.3390/jcm11195772.

Nonlinear Hyperparameter Optimization of a Neural Network in Image Processing for Micromachines.

Micromachines (Basel). 2021 Nov 30;12(12):1504. doi: 10.3390/mi12121504.

Femoral neck fracture detection in X-ray images using deep learning and genetic algorithm approaches.

Jt Dis Relat Surg. 2020;31(2):175-183. doi: 10.5606/ehc.2020.72163. Epub 2020 Mar 26.

Heuristic hyperparameter optimization of deep learning models for genomic prediction.

G3 (Bethesda). 2021 Jul 14;11(7). doi: 10.1093/g3journal/jkab032.

G2Basy: A framework to improve the RNN language model and ease overfitting problem.

PLoS One. 2021 Apr 14;16(4):e0249820. doi: 10.1371/journal.pone.0249820. eCollection 2021.

Prior to Initiation of Chemotherapy, Can We Predict Breast Tumor Response? Deep Learning Convolutional Neural Networks Approach Using a Breast MRI Tumor Dataset.

J Digit Imaging. 2019 Oct;32(5):693-701. doi: 10.1007/s10278-018-0144-1.

The generalized extreme learning machines: Tuning hyperparameters and limiting approach for the Moore-Penrose generalized inverse.

Neural Netw. 2021 Dec;144:591-602. doi: 10.1016/j.neunet.2021.09.008. Epub 2021 Sep 16.

Deep learning-based survival analysis for brain metastasis patients with the national cancer database.

J Appl Clin Med Phys. 2020 Sep;21(9):187-192. doi: 10.1002/acm2.12995. Epub 2020 Aug 13.

引用本文的文献

Integrating Imaging-Derived Clinical Endotypes with Plasma Proteomics and External Polygenic Risk Scores Enhances Coronary Microvascular Disease Risk Prediction.

medRxiv. 2025 Aug 21:2025.08.18.25333844. doi: 10.1101/2025.08.18.25333844.

Machine Learning-Based Gene Expression Analysis to Identify Prognostic Biomarkers in Upper Tract Urothelial Carcinoma.

Cancers (Basel). 2025 Aug 11;17(16):2619. doi: 10.3390/cancers17162619.

Relation knowledge distillation 3D-ResNet-based deep learning for breast cancer molecular subtypes prediction on ultrasound videos: a multicenter study.

Br J Cancer. 2025 Aug 26. doi: 10.1038/s41416-025-03146-7.

Leveraging Deep Learning, Grid Search, and Bayesian Networks to Predict Distant Recurrence of Breast Cancer.

Cancers (Basel). 2025 Jul 30;17(15):2515. doi: 10.3390/cancers17152515.

Innovative data augmentation strategy for deep learning on biological datasets with limited gene representations focused on chloroplast genomes.

Sci Rep. 2025 Jul 25;15(1):27079. doi: 10.1038/s41598-025-12796-9.

Comparative analysis of statistical and deep learning-based multi-omics integration for breast cancer subtype classification.

J Transl Med. 2025 Jul 1;23(1):709. doi: 10.1186/s12967-025-06662-5.

Deep Learning: A Heuristic Three-Stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-Based Clinical Data.

Cancers (Basel). 2025 Mar 25;17(7):1092. doi: 10.3390/cancers17071092.

Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome.

Viruses. 2024 Dec 25;17(1):12. doi: 10.3390/v17010012.

Fine-tuning inflow prediction models: integrating optimization algorithms and TRMM data for enhanced accuracy.

Water Sci Technol. 2024 Aug;90(3):844-877. doi: 10.2166/wst.2024.222. Epub 2024 Jul 3.

ProLesA-Net: A multi-channel 3D architecture for prostate MRI lesion segmentation with multi-scale channel and spatial attentions.

Patterns (N Y). 2024 May 15;5(7):100992. doi: 10.1016/j.patter.2024.100992. eCollection 2024 Jul 12.

本文引用的文献

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data.

J Clin Med. 2022 Sep 29;11(19):5772. doi: 10.3390/jcm11195772.

Estimated Projection of US Cancer Incidence and Death to 2040.

JAMA Netw Open. 2021 Apr 1;4(4):e214708. doi: 10.1001/jamanetworkopen.2021.4708.

Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.

CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.

Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis.

BMC Bioinformatics. 2020 Jul 10;21(1):298. doi: 10.1186/s12859-020-03638-8.

Cancer statistics, 2020.

CA Cancer J Clin. 2020 Jan;70(1):7-30. doi: 10.3322/caac.21590. Epub 2020 Jan 8.

Causes of death after breast cancer diagnosis: A US population-based analysis.

Cancer. 2020 Apr 1;126(7):1559-1567. doi: 10.1002/cncr.32648. Epub 2019 Dec 16.

Breast cancer statistics, 2019.

CA Cancer J Clin. 2019 Nov;69(6):438-451. doi: 10.3322/caac.21583. Epub 2019 Oct 2.

A clinical decision support system learned from data to personalize treatment recommendations towards preventing breast cancer metastasis.

PLoS One. 2019 Mar 8;14(3):e0213292. doi: 10.1371/journal.pone.0213292. eCollection 2019.

Convolutional Neural Network-Based Robot Navigation Using Uncalibrated Spherical Images.

Sensors (Basel). 2017 Jun 12;17(6):1341. doi: 10.3390/s17061341.

Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features.

BMC Bioinformatics. 2011 Oct 25;12:412. doi: 10.1186/1471-2105-12-412.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

深度学习预测乳腺癌转移中过拟合的实证研究

Empirical Study of Overfitting in Deep Learning for Predicting Breast Cancer Metastasis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

深度学习预测乳腺癌转移中过拟合的实证研究

Empirical Study of Overfitting in Deep Learning for Predicting Breast Cancer Metastasis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献