在估计基于深度学习的计算机辅助精神障碍诊断性能的系统中，存在数据泄露的风险。

Risk of data leakage in estimating the diagnostic performance of a deep-learning-based computer-aided system for psychiatric disorders.

机构信息

Department of Electronics and Information Engineering, Korea University, Sejong, Republic of Korea.

Interdisciplinary Graduate Program for Artificial Intelligence Smart Convergence Technology, Korea University, Sejong, Republic of Korea.

出版信息

Sci Rep. 2023 Oct 3;13(1):16633. doi: 10.1038/s41598-023-43542-8.

DOI:10.1038/s41598-023-43542-8

PMID:37789047

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10547830/

Abstract

Deep-learning approaches with data augmentation have been widely used when developing neuroimaging-based computer-aided diagnosis (CAD) systems. To prevent the inflated diagnostic performance caused by data leakage, a correct cross-validation (CV) method should be employed, but this has been still overlooked in recent deep-learning-based CAD studies. The goal of this study was to investigate the impact of correct and incorrect CV methods on the diagnostic performance of deep-learning-based CAD systems after data augmentation. To this end, resting-state electroencephalogram (EEG) data recorded from post-traumatic stress disorder patients and healthy controls were augmented using a cropping method with different window sizes, respectively. Four different CV approaches were used to estimate the diagnostic performance of the CAD system, i.e., subject-wise CV (sCV), overlapped sCV (oSCV), trial-wise CV (tCV), and overlapped tCV (otCV). Diagnostic performances were evaluated using two deep-learning models based on convolutional neural network. Data augmentation can increase the performance with all CVs, but inflated diagnostic performances were observed when using incorrect CVs (tCV and otCV) due to data leakage. Therefore, the correct CV (sCV and osCV) should be used to develop a deep-learning-based CAD system. We expect that our investigation can provide deep-insight for researchers who plan to develop neuroimaging-based CAD systems for psychiatric disorders using deep-learning algorithms with data augmentation.

摘要

深度学习方法结合数据增强已被广泛应用于开发基于神经影像学的计算机辅助诊断 (CAD) 系统。为了防止数据泄露导致诊断性能膨胀，应采用正确的交叉验证 (CV) 方法，但在最近基于深度学习的 CAD 研究中，这一点仍被忽视。本研究旨在探讨数据增强后，正确和不正确的 CV 方法对基于深度学习的 CAD 系统诊断性能的影响。为此，分别使用裁剪方法对创伤后应激障碍患者和健康对照组的静息态脑电图 (EEG) 数据进行了不同窗口大小的扩充。使用四种不同的 CV 方法来评估 CAD 系统的诊断性能，即个体内 CV (sCV)、重叠 sCV (oSCV)、试验内 CV (tCV) 和重叠 tCV (otCV)。使用两种基于卷积神经网络的深度学习模型评估了诊断性能。数据扩充可以提高所有 CV 的性能，但由于数据泄露，使用不正确的 CV (tCV 和 otCV) 会导致诊断性能膨胀。因此，应使用正确的 CV (sCV 和 osCV) 来开发基于深度学习的 CAD 系统。我们希望我们的研究可以为计划使用数据增强的深度学习算法为精神障碍开发基于神经影像学的 CAD 系统的研究人员提供深入的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5f5/10547830/cf3a0d4c70db/41598_2023_43542_Fig1_HTML.jpg

相似文献

Risk of data leakage in estimating the diagnostic performance of a deep-learning-based computer-aided system for psychiatric disorders.

Sci Rep. 2023 Oct 3;13(1):16633. doi: 10.1038/s41598-023-43542-8.

Computer-aided diagnosis with a convolutional neural network algorithm for automated detection of urinary tract stones on plain X-ray.

BMC Urol. 2021 Aug 5;21(1):102. doi: 10.1186/s12894-021-00874-9.

Developing and Evaluating an AI-Based Computer-Aided Diagnosis System for Retinal Disease: Diagnostic Study for Central Serous Chorioretinopathy.

J Med Internet Res. 2023 Nov 29;25:e48142. doi: 10.2196/48142.

Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography.

Med Phys. 2016 Dec;43(12):6654. doi: 10.1118/1.4967345.

Deep Learning-based Assessment of Facial Asymmetry Using U-Net Deep Convolutional Neural Network Algorithm.

J Craniofac Surg. 2024;35(1):133-136. doi: 10.1097/SCS.0000000000009862. Epub 2023 Nov 16.

Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: An overview.

Math Biosci Eng. 2019 Jul 15;16(6):6536-6561. doi: 10.3934/mbe.2019326.

Endoscopic three-categorical diagnosis of Helicobacter pylori infection using linked color imaging and deep learning: a single-center prospective study (with video).

Gastric Cancer. 2020 Nov;23(6):1033-1040. doi: 10.1007/s10120-020-01077-1. Epub 2020 May 7.

Evaluation of deep learning detection and classification towards computer-aided diagnosis of breast lesions in digital X-ray mammograms.

Comput Methods Programs Biomed. 2020 Nov;196:105584. doi: 10.1016/j.cmpb.2020.105584. Epub 2020 Jun 4.

Computer-aided diagnosis in the era of deep learning.

Med Phys. 2020 Jun;47(5):e218-e227. doi: 10.1002/mp.13764.

Evaluation of a deep learning-based computer-aided diagnosis system for distinguishing benign from malignant thyroid nodules in ultrasound images.

Med Phys. 2020 Sep;47(9):3952-3960. doi: 10.1002/mp.14301. Epub 2020 Jun 25.

引用本文的文献

Advances in Electroencephalography for Post-Traumatic Stress Disorder Identification: A Scoping Review.

IEEE Open J Eng Med Biol. 2025 Feb 5;6:332-344. doi: 10.1109/OJEMB.2025.3538498. eCollection 2025.

Early Diagnosis of Alzheimer's Disease in Human Participants Using EEGConformer and Attention-Based LSTM During the Short Question Task.

Diagnostics (Basel). 2025 Feb 12;15(4):448. doi: 10.3390/diagnostics15040448.

Development and validation of a machine learning model to predict time to renal replacement therapy in patients with chronic kidney disease.

BMC Nephrol. 2024 Mar 16;25(1):101. doi: 10.1186/s12882-024-03527-9.

Exploring the Possibility of Photoplethysmography-Based Human Activity Recognition Using Convolutional Neural Networks.

Sensors (Basel). 2024 Mar 1;24(5):1610. doi: 10.3390/s24051610.

本文引用的文献

Inflated prediction accuracy of neuropsychiatric biomarkers caused by data leakage in feature selection.

Sci Rep. 2021 Apr 12;11(1):7980. doi: 10.1038/s41598-021-87157-3.

Classification of normal and depressed EEG signals based on centered correntropy of rhythms in empirical wavelet transform domain.

Health Inf Sci Syst. 2021 Feb 6;9(1):9. doi: 10.1007/s13755-021-00139-7. eCollection 2021 Dec.

A Deep Convolutional Neural Network Method to Detect Seizures and Characteristic Frequencies Using Epileptic Electroencephalogram (EEG) Data.

IEEE J Transl Eng Health Med. 2021 Jan 11;9:2000112. doi: 10.1109/JTEHM.2021.3050925. eCollection 2021.

Spectral features based convolutional neural network for accurate and prompt identification of schizophrenic patients.

Proc Inst Mech Eng H. 2021 Feb;235(2):167-184. doi: 10.1177/0954411920966937. Epub 2020 Oct 30.

Major Depressive Disorder Classification Based on Different Convolutional Neural Network Models: Deep Learning Approach.

Clin EEG Neurosci. 2021 Jan;52(1):38-51. doi: 10.1177/1550059420916634. Epub 2020 Jun 3.

Identification of Children at Risk of Schizophrenia via Deep Learning and EEG Responses.

IEEE J Biomed Health Inform. 2021 Jan;25(1):69-76. doi: 10.1109/JBHI.2020.2984238. Epub 2021 Jan 5.

Deep convolutional neural network for classification of sleep stages from single-channel EEG signals.

J Neurosci Methods. 2019 Aug 1;324:108312. doi: 10.1016/j.jneumeth.2019.108312. Epub 2019 Jun 12.

Efficient Classification of Motor Imagery Electroencephalography Signals Using Deep Learning Methods.

Sensors (Basel). 2019 Apr 11;19(7):1736. doi: 10.3390/s19071736.

EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces.

J Neural Eng. 2018 Oct;15(5):056013. doi: 10.1088/1741-2552/aace8c. Epub 2018 Jun 22.

Automated EEG-based screening of depression using deep convolutional neural network.

Comput Methods Programs Biomed. 2018 Jul;161:103-113. doi: 10.1016/j.cmpb.2018.04.012. Epub 2018 Apr 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在估计基于深度学习的计算机辅助精神障碍诊断性能的系统中，存在数据泄露的风险。

Risk of data leakage in estimating the diagnostic performance of a deep-learning-based computer-aided system for psychiatric disorders.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献