• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

特征选择中的数据泄露导致神经精神生物标志物预测精度膨胀。

Inflated prediction accuracy of neuropsychiatric biomarkers caused by data leakage in feature selection.

机构信息

Department of Electronics and Information, Korea University, 2511, Sejong-ro, Jochiwon-eup, Sejong-si, 30019, Republic of Korea.

Psychiatry Department, Ilsan Paik Hospital, Inje University, Goyang, Republic of Korea.

出版信息

Sci Rep. 2021 Apr 12;11(1):7980. doi: 10.1038/s41598-021-87157-3.

DOI:10.1038/s41598-021-87157-3
PMID:33846489
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8042090/
Abstract

In recent years, machine learning techniques have been frequently applied to uncovering neuropsychiatric biomarkers with the aim of accurately diagnosing neuropsychiatric diseases and predicting treatment prognosis. However, many studies did not perform cross validation (CV) when using machine learning techniques, or others performed CV in an incorrect manner, leading to significantly biased results due to overfitting problem. The aim of this study is to investigate the impact of CV on the prediction performance of neuropsychiatric biomarkers, in particular, for feature selection performed with high-dimensional features. To this end, we evaluated prediction performances using both simulation data and actual electroencephalography (EEG) data. The overall prediction accuracies of the feature selection method performed outside of CV were considerably higher than those of the feature selection method performed within CV for both the simulation and actual EEG data. The differences between the prediction accuracies of the two feature selection approaches can be thought of as the amount of overfitting due to selection bias. Our results indicate the importance of correctly using CV to avoid biased results of prediction performance of neuropsychiatric biomarkers.

摘要

近年来,机器学习技术已被广泛应用于揭示神经精神生物标志物,旨在准确诊断神经精神疾病和预测治疗预后。然而,许多研究在使用机器学习技术时并未进行交叉验证(CV),或者其他研究以不正确的方式进行 CV,导致由于过度拟合问题导致结果存在显著偏差。本研究旨在探讨 CV 对神经精神生物标志物预测性能的影响,特别是对于高维特征进行特征选择的情况。为此,我们使用模拟数据和实际脑电图(EEG)数据评估了预测性能。对于模拟和实际 EEG 数据,在 CV 之外执行的特征选择方法的整体预测准确性明显高于在 CV 内执行的特征选择方法的预测准确性。两种特征选择方法的预测准确性之间的差异可以被认为是由于选择偏差导致的过度拟合量。我们的结果表明正确使用 CV 避免神经精神生物标志物预测性能的有偏结果的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/acb5d1941095/41598_2021_87157_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/7380bac851c3/41598_2021_87157_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/ad67653a216d/41598_2021_87157_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/4adbd08a69bc/41598_2021_87157_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/ecb2ee7b8159/41598_2021_87157_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/acb5d1941095/41598_2021_87157_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/7380bac851c3/41598_2021_87157_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/ad67653a216d/41598_2021_87157_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/4adbd08a69bc/41598_2021_87157_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/ecb2ee7b8159/41598_2021_87157_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3951/8042090/acb5d1941095/41598_2021_87157_Fig5_HTML.jpg

相似文献

1
Inflated prediction accuracy of neuropsychiatric biomarkers caused by data leakage in feature selection.特征选择中的数据泄露导致神经精神生物标志物预测精度膨胀。
Sci Rep. 2021 Apr 12;11(1):7980. doi: 10.1038/s41598-021-87157-3.
2
The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。
Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.
3
Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics.在放射组学中使用交叉验证时测量特征选择错误应用的偏差。
Insights Imaging. 2021 Nov 24;12(1):172. doi: 10.1186/s13244-021-01115-1.
4
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
5
Combination of Serum and Plasma Biomarkers Could Improve Prediction Performance for Alzheimer's Disease.血清和血浆生物标志物的联合应用可以提高阿尔茨海默病的预测性能。
Genes (Basel). 2022 Sep 27;13(10):1738. doi: 10.3390/genes13101738.
6
Prediction-oriented prognostic biomarker discovery with survival machine learning methods.运用生存机器学习方法进行面向预测的预后生物标志物发现。
NAR Genom Bioinform. 2023 Jun 16;5(2):lqad055. doi: 10.1093/nargab/lqad055. eCollection 2023 Jun.
7
Machine learning algorithm validation with a limited sample size.机器学习算法在有限样本量下的验证。
PLoS One. 2019 Nov 7;14(11):e0224365. doi: 10.1371/journal.pone.0224365. eCollection 2019.
8
How (Not) to Generate a Highly Predictive Biomarker Panel Using Machine Learning.如何(不)使用机器学习生成高度可预测的生物标志物面板。
J Proteome Res. 2022 Sep 2;21(9):2071-2074. doi: 10.1021/acs.jproteome.2c00117. Epub 2022 Aug 25.
9
Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events.使用惩罚回归识别高血压患者的临床相关特征:心血管事件的案例研究。
Med Biol Eng Comput. 2019 Sep;57(9):2011-2026. doi: 10.1007/s11517-019-02007-9. Epub 2019 Jul 25.
10
Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs.利用机器学习实现猪生产性状的遗传位点筛选和基因组预测。
FASEB J. 2023 Jun;37(6):e22961. doi: 10.1096/fj.202300245R.

引用本文的文献

1
Neuroimaging Insights into the Public Health Burden of Neuropsychiatric Disorders: A Systematic Review of Electroencephalography-Based Cognitive Biomarkers.神经影像学对神经精神疾病公共卫生负担的见解:基于脑电图的认知生物标志物的系统综述
Medicina (Kaunas). 2025 May 28;61(6):1003. doi: 10.3390/medicina61061003.
2
Ecological niche modeling for surveillance of foot-and-mouth disease in South Asia.用于南亚口蹄疫监测的生态位建模
PLoS One. 2025 Apr 22;20(4):e0320921. doi: 10.1371/journal.pone.0320921. eCollection 2025.
3
A Shift Toward Supercritical Brain Dynamics Predicts Alzheimer's Disease Progression.

本文引用的文献

1
Use of Machine Learning for Predicting Escitalopram Treatment Outcome From Electroencephalography Recordings in Adult Patients With Depression.使用机器学习从抑郁症成年患者的脑电图记录中预测艾司西酞普兰治疗结果。
JAMA Netw Open. 2020 Jan 3;3(1):e1918377. doi: 10.1001/jamanetworkopen.2019.18377.
2
Electroencephalographic Biomarkers for Predicting Antidepressant Response: New Methods, Old Questions.用于预测抗抑郁反应的脑电图生物标志物:新方法,旧问题。
JAMA Psychiatry. 2020 Apr 1;77(4):347-348. doi: 10.1001/jamapsychiatry.2019.3749.
3
Establishment of Best Practices for Evidence for Prediction: A Review.
向超临界脑动力学的转变预示着阿尔茨海默病的进展。
J Neurosci. 2025 Feb 26;45(9):e0688242024. doi: 10.1523/JNEUROSCI.0688-24.2024.
4
From Serendipity to Precision: Integrating AI, Multi-Omics, and Human-Specific Models for Personalized Neuropsychiatric Care.从意外发现到精准医疗:整合人工智能、多组学和人类特异性模型以实现个性化神经精神疾病护理。
Biomedicines. 2025 Jan 12;13(1):167. doi: 10.3390/biomedicines13010167.
5
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction.用于识别语音用户交互中不同情绪的 EEG 数据集。
Sci Data. 2024 Oct 3;11(1):1084. doi: 10.1038/s41597-024-03887-9.
6
Machine learning-enabled prediction of prolonged length of stay in hospital after surgery for tuberculosis spondylitis patients with unbalanced data: a novel approach using explainable artificial intelligence (XAI).机器学习在数据不平衡的情况下预测脊柱结核手术后住院时间延长的预测:一种使用可解释人工智能 (XAI) 的新方法。
Eur J Med Res. 2024 Jul 25;29(1):383. doi: 10.1186/s40001-024-01988-0.
7
Risk of data leakage in estimating the diagnostic performance of a deep-learning-based computer-aided system for psychiatric disorders.在估计基于深度学习的计算机辅助精神障碍诊断性能的系统中,存在数据泄露的风险。
Sci Rep. 2023 Oct 3;13(1):16633. doi: 10.1038/s41598-023-43542-8.
8
Leakage and the reproducibility crisis in machine-learning-based science.基于机器学习的科学中的漏洞与可重复性危机。
Patterns (N Y). 2023 Aug 4;4(9):100804. doi: 10.1016/j.patter.2023.100804. eCollection 2023 Sep 8.
9
Electronic health records and stratified psychiatry: bridge to precision treatment?电子健康记录与分层精神病学:通向精准治疗的桥梁?
Neuropsychopharmacology. 2024 Jan;49(1):285-290. doi: 10.1038/s41386-023-01724-y. Epub 2023 Sep 4.
10
An Overview of Bipolar Disorder Diagnosis Using Machine Learning Approaches: Clinical Opportunities and Challenges.使用机器学习方法进行双相情感障碍诊断概述:临床机遇与挑战
Iran J Psychiatry. 2023 Apr;18(2):237-247. doi: 10.18502/ijps.v18i2.12372.
建立最佳实践证据预测:综述。
JAMA Psychiatry. 2020 May 1;77(5):534-540. doi: 10.1001/jamapsychiatry.2019.3671.
4
Electroencephalographic Biomarkers for Treatment Response Prediction in Major Depressive Illness: A Meta-Analysis.电生理生物标志物预测重度抑郁障碍治疗反应的Meta 分析。
Am J Psychiatry. 2019 Jan 1;176(1):44-56. doi: 10.1176/appi.ajp.2018.17121358. Epub 2018 Oct 3.
5
Disrupted cortical brain network in post-traumatic stress disorder patients: a resting-state electroencephalographic study.创伤后应激障碍患者大脑皮质网络紊乱:一项静息态脑电图研究。
Transl Psychiatry. 2017 Sep 12;7(9):e1231. doi: 10.1038/tp.2017.200.
6
Future clinical uses of neurophysiological biomarkers to predict and monitor treatment response for schizophrenia.神经生理学生物标志物在精神分裂症治疗反应预测和监测中的未来临床应用。
Ann N Y Acad Sci. 2015 May;1344(1):105-19. doi: 10.1111/nyas.12730. Epub 2015 Mar 9.
7
Functional connectivity of resting state EEG and symptom severity in patients with post-traumatic stress disorder.创伤后应激障碍患者静息态脑电图的功能连接性与症状严重程度
Prog Neuropsychopharmacol Biol Psychiatry. 2014 Jun 3;51:51-7. doi: 10.1016/j.pnpbp.2014.01.008. Epub 2014 Jan 19.
8
In search of biomarkers in psychiatry: EEG-based measures of brain function.在精神病学中寻找生物标志物:基于脑电图的大脑功能测量。
Am J Med Genet B Neuropsychiatr Genet. 2014 Mar;165B(2):111-21. doi: 10.1002/ajmg.b.32208. Epub 2013 Nov 25.
9
EEG complexity as a biomarker for autism spectrum disorder risk.脑电图复杂度作为自闭症谱系障碍风险的生物标志物。
BMC Med. 2011 Feb 22;9:18. doi: 10.1186/1741-7015-9-18.
10
Frontal EEG predictors of treatment outcome in major depressive disorder.重度抑郁症治疗结果的额叶脑电图预测指标
Eur Neuropsychopharmacol. 2009 Nov;19(11):772-7. doi: 10.1016/j.euroneuro.2009.06.001. Epub 2009 Jul 1.