• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

转化性脑电图深度学习研究中的数据泄露。

Data leakage in deep learning studies of translational EEG.

作者信息

Brookshire Geoffrey, Kasper Jake, Blauch Nicholas M, Wu Yunan Charles, Glatt Ryan, Merrill David A, Gerrol Spencer, Yoder Keith J, Quirk Colin, Lucero Ché

机构信息

SPARK Neuro Inc., New York, NY, United States.

Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States.

出版信息

Front Neurosci. 2024 May 3;18:1373515. doi: 10.3389/fnins.2024.1373515. eCollection 2024.

DOI:10.3389/fnins.2024.1373515
PMID:38765672
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11099244/
Abstract

A growing number of studies apply deep neural networks (DNNs) to recordings of human electroencephalography (EEG) to identify a range of disorders. In many studies, EEG recordings are split into segments, and each segment is randomly assigned to the training or test set. As a consequence, data from individual subjects appears in both the training and the test set. Could high test-set accuracy reflect data leakage from subject-specific patterns in the data, rather than patterns that identify a disease? We address this question by testing the performance of DNN classifiers using segment-based holdout (in which segments from one subject can appear in both the training and test set), and comparing this to their performance using subject-based holdout (where all segments from one subject appear exclusively in either the training set or the test set). In two datasets (one classifying Alzheimer's disease, and the other classifying epileptic seizures), we find that performance on previously-unseen subjects is strongly overestimated when models are trained using segment-based holdout. Finally, we survey the literature and find that the majority of translational DNN-EEG studies use segment-based holdout. Most published DNN-EEG studies may dramatically overestimate their classification performance on new subjects.

摘要

越来越多的研究将深度神经网络(DNN)应用于人类脑电图(EEG)记录,以识别一系列疾病。在许多研究中,EEG记录被分割成片段,每个片段被随机分配到训练集或测试集。因此,来自个体受试者的数据会同时出现在训练集和测试集中。高测试集准确率是否反映了数据中特定于受试者的模式导致的数据泄露,而不是识别疾病的模式?我们通过使用基于片段的留出法(其中来自一个受试者的片段可以同时出现在训练集和测试集中)测试DNN分类器的性能,并将其与使用基于受试者的留出法(其中来自一个受试者的所有片段仅出现在训练集或测试集中)的性能进行比较,来解决这个问题。在两个数据集(一个用于对阿尔茨海默病进行分类,另一个用于对癫痫发作进行分类)中,我们发现,当使用基于片段的留出法训练模型时,对以前未见过的受试者的性能被严重高估。最后,我们查阅了文献,发现大多数转化性DNN-EEG研究使用的是基于片段的留出法。大多数已发表的DNN-EEG研究可能会大幅高估其对新受试者的分类性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/1dd88c0d2c1f/fnins-18-1373515-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/9bf76a6a6adb/fnins-18-1373515-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/49a1533bf23c/fnins-18-1373515-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/ea62c7f738d4/fnins-18-1373515-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/1dd88c0d2c1f/fnins-18-1373515-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/9bf76a6a6adb/fnins-18-1373515-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/49a1533bf23c/fnins-18-1373515-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/ea62c7f738d4/fnins-18-1373515-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/11099244/1dd88c0d2c1f/fnins-18-1373515-g0004.jpg

相似文献

1
Data leakage in deep learning studies of translational EEG.转化性脑电图深度学习研究中的数据泄露。
Front Neurosci. 2024 May 3;18:1373515. doi: 10.3389/fnins.2024.1373515. eCollection 2024.
2
Robust decoding of the speech envelope from EEG recordings through deep neural networks.通过深度神经网络从 EEG 记录中稳健地解码语音包络。
J Neural Eng. 2022 Jul 6;19(4). doi: 10.1088/1741-2552/ac7976.
3
Diagnosis of Alzheimer's disease and Mild Cognitive Impairment using EEG and Recurrent Neural Networks.使用 EEG 和递归神经网络诊断阿尔茨海默病和轻度认知障碍。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3179-3182. doi: 10.1109/EMBC48229.2022.9871302.
4
High performance clean versus artifact dry electrode EEG data classification using Convolutional Neural Network transfer learning.使用卷积神经网络迁移学习的高性能清洁与伪迹干电极脑电图数据分类
Clin Neurophysiol Pract. 2023 Apr 25;8:88-91. doi: 10.1016/j.cnp.2023.04.002. eCollection 2023.
5
Risks of feature leakage and sample size dependencies in deep feature extraction for breast mass classification.深度特征提取在乳腺肿块分类中特征泄露和样本大小依赖的风险。
Med Phys. 2021 Jun;48(6):2827-2837. doi: 10.1002/mp.14678. Epub 2021 Apr 12.
6
Seizure forecasting using minimally invasive, ultra-long-term subcutaneous EEG: Generalizable cross-patient models.使用微创、超长时间皮下 EEG 进行癫痫发作预测:可推广的跨患者模型。
Epilepsia. 2023 Dec;64 Suppl 4(Suppl 4):S114-S123. doi: 10.1111/epi.17265. Epub 2022 May 4.
7
Epileptic seizure detection: a comparative study between deep and traditional machine learning techniques.癫痫发作检测:深度学习与传统机器学习技术的比较研究
J Integr Neurosci. 2020 Mar 30;19(1):1-9. doi: 10.31083/j.jin.2020.01.24.
8
Decoding of the speech envelope from EEG using the VLAAI deep neural network.使用 VLAAI 深度神经网络对 EEG 进行语音包络解码。
Sci Rep. 2023 Jan 16;13(1):812. doi: 10.1038/s41598-022-27332-2.
9
Epileptic seizure detection with deep EEG features by convolutional neural network and shallow classifiers.基于卷积神经网络和浅层分类器利用深度脑电图特征进行癫痫发作检测
Front Neurosci. 2023 May 22;17:1145526. doi: 10.3389/fnins.2023.1145526. eCollection 2023.
10
Perception without preconception: comparison between the human and machine learner in recognition of tissues from histological sections.无预设认知的感知:在识别组织切片方面,人与机器学习者的比较。
Sci Rep. 2022 Sep 30;12(1):16420. doi: 10.1038/s41598-022-20012-1.

引用本文的文献

1
How EEG preprocessing shapes decoding performance.脑电图预处理如何塑造解码性能。
Commun Biol. 2025 Jul 10;8(1):1039. doi: 10.1038/s42003-025-08464-3.
2
Measuring electrophysiological changes induced by sub-concussive impacts due to soccer ball heading.测量因头球顶足球而产生的次脑震荡撞击所引起的电生理变化。
Front Neurol. 2025 Mar 6;16:1500796. doi: 10.3389/fneur.2025.1500796. eCollection 2025.
3
The NERVE-ML (neural engineering reproducibility and validity essentials for machine learning) checklist: ensuring machine learning advances neural engineering.

本文引用的文献

1
High Dimensional Convolutional Neural Network for EEG Connectivity-Based Diagnosis of ADHD.基于脑电图连接性的注意力缺陷多动障碍诊断的高维卷积神经网络
J Biomed Phys Eng. 2022 Dec 1;12(6):645-654. doi: 10.31661/jbpe.v0i0.2108-1380. eCollection 2022 Dec.
2
Differentiation of Subjective Cognitive Decline, Mild Cognitive Impairment, and Dementia Using qEEG/ERP-Based Cognitive Testing and Volumetric MRI in an Outpatient Specialty Memory Clinic.使用基于 qEEG/ERP 的认知测试和容积 MRI 在门诊专科记忆诊所中区分主观认知下降、轻度认知障碍和痴呆。
J Alzheimers Dis. 2022;90(4):1761-1769. doi: 10.3233/JAD-220616.
3
Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images.
NERVE-ML(机器学习的神经工程可重复性和有效性要点)清单:确保机器学习推动神经工程发展。
J Neural Eng. 2025 Mar 27;22(2):021002. doi: 10.1088/1741-2552/adbfbd.
4
Driving-Related Cognitive Abilities Prediction Based on Transformer's Multimodal Fusion Framework.基于Transformer多模态融合框架的驾驶相关认知能力预测
Sensors (Basel). 2024 Dec 31;25(1):174. doi: 10.3390/s25010174.
5
Toward improving reproducibility in neuroimaging deep learning studies.致力于提高神经影像深度学习研究的可重复性。
Front Neurosci. 2024 Dec 2;18:1509358. doi: 10.3389/fnins.2024.1509358. eCollection 2024.
6
AI chatbots show promise but limitations on UK medical exam questions: a comparative performance study.人工智能聊天机器人在英国医学考试问题上有前景但也有限制:一项比较性能研究。
Sci Rep. 2024 Aug 14;14(1):18859. doi: 10.1038/s41598-024-68996-2.
深度学习分类 OCT 图像中因数据泄露导致的测试精度膨胀。
Sci Data. 2022 Sep 22;9(1):580. doi: 10.1038/s41597-022-01618-6.
4
Diagnosis of Alzheimer's disease and Mild Cognitive Impairment using EEG and Recurrent Neural Networks.使用 EEG 和递归神经网络诊断阿尔茨海默病和轻度认知障碍。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3179-3182. doi: 10.1109/EMBC48229.2022.9871302.
5
Deep Learning Enabled Diagnosis of Children's ADHD Based on the Big Data of Video Screen Long-Range EEG.基于视频屏幕长程 EEG 大数据的深度学习辅助儿童 ADHD 诊断。
J Healthc Eng. 2022 Apr 4;2022:5222136. doi: 10.1155/2022/5222136. eCollection 2022.
6
Resting-state electroencephalography based deep-learning for the detection of Parkinson's disease.基于静息态脑电图的深度学习用于帕金森病的检测。
PLoS One. 2022 Feb 24;17(2):e0263159. doi: 10.1371/journal.pone.0263159. eCollection 2022.
7
Neurological state changes indicative of ADHD in children learned via EEG-based LSTM networks.基于 EEG 的 LSTM 网络学习提示儿童注意缺陷多动障碍的神经状态变化。
J Neural Eng. 2022 Feb 10;19(1). doi: 10.1088/1741-2552/ac4f07.
8
Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications.交叉验证技术的选择对基于机器学习的诊断应用结果的影响。
Healthc Inform Res. 2021 Jul;27(3):189-199. doi: 10.4258/hir.2021.27.3.189. Epub 2021 Jul 31.
9
Epileptic Seizures Detection Using Deep Learning Techniques: A Review.基于深度学习技术的癫痫发作检测:综述
Int J Environ Res Public Health. 2021 May 27;18(11):5780. doi: 10.3390/ijerph18115780.
10
Deep learning of resting-state electroencephalogram signals for three-class classification of Alzheimer's disease, mild cognitive impairment and healthy ageing.基于静息态脑电图信号的深度学习用于阿尔茨海默病、轻度认知障碍和健康老化的三分类。
J Neural Eng. 2021 Jun 17;18(4). doi: 10.1088/1741-2552/ac05d8.