痴呆预测中的数据挖掘方法：线性判别分析、逻辑回归、神经网络、支持向量机、分类树和随机森林在准确性、敏感性和特异性方面的实际数据比较。

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests.

作者信息

Maroco João, Silva Dina, Rodrigues Ana, Guerreiro Manuela, Santana Isabel, de Mendonça Alexandre

机构信息

Unidade de Investigação em Psicologia e Saúde & Departamento de Estatística, ISPA - Instituto Universitário, Rua Jardim do Tabaco 44, 1149-041 Lisboa, Portugal.

出版信息

BMC Res Notes. 2011 Aug 17;4:299. doi: 10.1186/1756-0500-4-299.

DOI:10.1186/1756-0500-4-299

PMID:21849043

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3180705/

Abstract

BACKGROUND

Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test.

RESULTS

Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5.

CONCLUSIONS

When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

摘要

背景

与衰老相关的痴呆和认知障碍是主要的医学和社会问题。神经心理学测试是轻度认知障碍（MCI）诊断程序中的关键要素，但目前在预测向痴呆进展方面价值有限。我们提出假设，源自数据挖掘和机器学习方法（如神经网络、支持向量机和随机森林）的更新统计分类方法可以提高从神经心理学测试获得的预测的准确性、敏感性和特异性。将源自数据挖掘方法的七个非参数分类器（多层感知器神经网络、径向基函数神经网络、支持向量机、CART、CHAID和QUEST分类树以及随机森林）与三个传统分类器（线性判别分析、二次判别分析和逻辑回归）在总体分类准确性、特异性、敏感性、ROC曲线下面积和Press'Q方面进行了比较。模型预测指标是目前用于痴呆诊断中的10项神经心理学测试。使用Friedman非参数检验比较了从5折交叉验证获得的分类参数的统计分布。

结果

Press'Q检验表明，所有分类器的表现均优于随机水平（p < 0.05）。支持向量机显示出较大的总体分类准确性（中位数（Me）= 0.76）和ROC曲线下面积（Me = 0.90）。然而，该方法显示出高特异性（Me = 1.0）但低敏感性（Me = 0.3）。随机森林在总体准确性方面排名第二（Me = 0.73），ROC曲线下面积高（Me = 0.73），特异性（Me = 0.73）和敏感性（Me = 0.64）。线性判别分析也显示出可接受的总体准确性（Me = 0.66），ROC曲线下面积可接受（Me = 0.72），特异性（Me = 0.66）和敏感性（Me = 0.64）。其余分类器显示总体分类准确性高于中位数0.63，但大多数的敏感性约为或甚至低于中位数0.5。

结论

在考虑敏感性、特异性和总体分类准确性时，随机森林和线性判别分析在使用多项神经心理学测试预测痴呆的所有测试分类器中排名第一。这些方法可用于提高神经心理学测试中痴呆预测的准确性、敏感性和特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b906/3180705/d232717bd7b6/1756-0500-4-299-1.jpg

相似文献

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests.痴呆预测中的数据挖掘方法：线性判别分析、逻辑回归、神经网络、支持向量机、分类树和随机森林在准确性、敏感性和特异性方面的实际数据比较。

BMC Res Notes. 2011 Aug 17;4:299. doi: 10.1186/1756-0500-4-299.

Real-data comparison of data mining methods in prediction of diabetes in iran.伊朗糖尿病预测中数据挖掘方法的真实数据比较

Healthc Inform Res. 2013 Sep;19(3):177-85. doi: 10.4258/hir.2013.19.3.177. Epub 2013 Sep 30.

Discrimination between healthy participants and people with panic disorder based on polygenic scores for psychiatric disorders and for intermediate phenotypes using machine learning.基于精神障碍和中间表型的多基因评分，使用机器学习对健康参与者和惊恐障碍患者进行区分。

Aust N Z J Psychiatry. 2024 Jul;58(7):603-614. doi: 10.1177/00048674241242936. Epub 2024 Apr 6.

Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.基于 ANOVA 皮质和皮质下特征选择和偏最小二乘法的随机森林与 One vs. Rest 分类器集成用于 MCI 和 AD 预测。

J Neurosci Methods. 2018 May 15;302:47-57. doi: 10.1016/j.jneumeth.2017.12.005. Epub 2017 Dec 11.

Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms.使用多种机器学习范例对结肠微阵列基因表达数据进行统计特征描述和分类。

Comput Methods Programs Biomed. 2019 Jul;176:173-193. doi: 10.1016/j.cmpb.2019.04.008. Epub 2019 Apr 10.

Using artificial neural networks to select the parameters for the prognostic of mild cognitive impairment and dementia in elderly individuals.利用人工神经网络选择老年人轻度认知障碍和痴呆预后的参数。

Comput Methods Programs Biomed. 2017 Dec;152:93-104. doi: 10.1016/j.cmpb.2017.09.013. Epub 2017 Sep 20.

Dementia risk prediction in individuals with mild cognitive impairment: a comparison of Cox regression and machine learning models.轻度认知障碍个体的痴呆风险预测：Cox 回归和机器学习模型的比较。

BMC Med Res Methodol. 2022 Nov 2;22(1):284. doi: 10.1186/s12874-022-01754-y.

Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.利用机器学习进行准确的糖尿病风险分层：缺失值和异常值的作用。

J Med Syst. 2018 Apr 10;42(5):92. doi: 10.1007/s10916-018-0940-7.

Incremental value of biomarker combinations to predict progression of mild cognitive impairment to Alzheimer's dementia.生物标志物组合对预测轻度认知障碍向阿尔茨海默病痴呆进展的增量价值。

Alzheimers Res Ther. 2017 Oct 10;9(1):84. doi: 10.1186/s13195-017-0301-7.

Gaussian process classification of Alzheimer's disease and mild cognitive impairment from resting-state fMRI.基于静息态功能磁共振成像的阿尔茨海默病和轻度认知障碍的高斯过程分类

Neuroimage. 2015 May 15;112:232-243. doi: 10.1016/j.neuroimage.2015.02.037. Epub 2015 Feb 28.

引用本文的文献

Improved Shell Color Index for Chicken Eggs with Blue-green Shells Based on Machine Learning Analysis.基于机器学习分析的蓝绿色蛋壳鸡蛋的改进蛋壳颜色指数

Foods. 2025 Aug 29;14(17):3027. doi: 10.3390/foods14173027.

Data-Driven Fault Detection and Diagnosis in Cooling Units Using Sensor-Based Machine Learning Classification.基于传感器的机器学习分类在冷却装置中的数据驱动故障检测与诊断

Sensors (Basel). 2025 Jun 11;25(12):3647. doi: 10.3390/s25123647.

Exploring the induction and measurement of positive affective state in equines through a personality-centred lens.通过以个性为中心的视角探索马属动物积极情感状态的诱导与测量。

Sci Rep. 2025 May 27;15(1):18550. doi: 10.1038/s41598-025-98034-8.

Construction and validation of a predictive model for poor long-term prognosis in severe acute ischemic stroke after endovascular treatment based on LASSO regression.基于LASSO回归构建并验证血管内治疗后严重急性缺血性卒中远期预后不良的预测模型

Front Neurol. 2025 Apr 14;16:1535679. doi: 10.3389/fneur.2025.1535679. eCollection 2025.

Neuropsychological tests and machine learning: identifying predictors of MCI and dementia progression.神经心理学测试与机器学习：识别轻度认知障碍和痴呆症进展的预测因素。

Aging Clin Exp Res. 2025 Mar 12;37(1):79. doi: 10.1007/s40520-025-02962-4.

Breast cancer prediction based on gene expression data using interpretable machine learning techniques.基于基因表达数据，运用可解释机器学习技术进行乳腺癌预测。

Sci Rep. 2025 Mar 4;15(1):7594. doi: 10.1038/s41598-025-85323-5.

Sexual dimorphism of the humerus bones in a French sample: comparison of several statistical models including machine learning models.法国样本中肱骨的性别二态性：几种统计模型（包括机器学习模型）的比较

Int J Legal Med. 2025 May;139(3):1395-1408. doi: 10.1007/s00414-025-03417-1. Epub 2025 Jan 25.

Identifying risk factors for carbapenem-resistant Acinetobacter baumannii carriage upon admission: a case-case control study.入院时耐碳青霉烯鲍曼不动杆菌携带的危险因素识别：一项病例-病例对照研究

Antimicrob Resist Infect Control. 2024 Dec 20;13(1):153. doi: 10.1186/s13756-024-01500-7.

Frequency, sociodemographic, and neuropsychological features of patients with subjective cognitive decline diagnosed using different neuropsychological criteria.使用不同神经心理学标准诊断的主观认知衰退患者的频率、社会人口学特征及神经心理学特征

Alzheimers Res Ther. 2024 Dec 5;16(1):261. doi: 10.1186/s13195-024-01634-1.

Development of Machine Learning Algorithms for Identifying Patients With Limited Health Literacy.开发用于识别健康素养有限的患者的机器学习算法。

J Eval Clin Pract. 2025 Feb;31(1):e14248. doi: 10.1111/jep.14248.

本文引用的文献

Novel application of a statistical technique, Random Forests, in a bacterial source tracking study.随机森林统计技术在细菌溯源研究中的新应用。

Water Res. 2010 Jul;44(14):4067-76. doi: 10.1016/j.watres.2010.05.019. Epub 2010 May 31.

Support vector machines in DSC-based glioma imaging: suggestions for optimal characterization.基于 DSC 的脑胶质瘤影像中的支持向量机：最佳特征化建议。

Magn Reson Med. 2010 Oct;64(4):1230-6. doi: 10.1002/mrm.22495.

Neural networks.神经网络

Methods Mol Biol. 2010;609:197-222. doi: 10.1007/978-1-60327-241-4_12.

Use of SVM methods with surface-based cortical and volumetric subcortical measurements to detect Alzheimer's disease.利用基于表面的皮质和体积下皮质测量的 SVM 方法来检测阿尔茨海默病。

J Alzheimers Dis. 2010;19(4):1263-72. doi: 10.3233/JAD-2010-1322.

Comparing performances of logistic regression and neural networks for predicting melatonin excretion patterns in the rat exposed to ELF magnetic fields.比较逻辑回归和神经网络在预测暴露于极低频磁场的大鼠褪黑素排泄模式方面的性能。

Bioelectromagnetics. 2010 Feb;31(2):164-71. doi: 10.1002/bem.20541.

Feature selection and performance evaluation of support vector machine (SVM)-based classifier for differentiating benign and malignant pulmonary nodules by computed tomography.基于支持向量机（SVM）的分类器在 CT 鉴别肺良恶性结节中的特征选择和性能评估。

J Digit Imaging. 2010 Feb;23(1):51-65. doi: 10.1007/s10278-009-9185-9. Epub 2009 Feb 26.

Are random forests better than support vector machines for microarray-based cancer classification?对于基于微阵列的癌症分类，随机森林算法比支持向量机算法更好吗？

AMIA Annu Symp Proc. 2007 Oct 11;2007:686-90.

Radial basis function neural networks classification for the recognition of idiopathic pulmonary fibrosis in microscopic images.用于在微观图像中识别特发性肺纤维化的径向基函数神经网络分类

IEEE Trans Inf Technol Biomed. 2008 Jan;12(1):42-54. doi: 10.1109/TITB.2006.888702.

Amnestic syndrome of the medial temporal type identifies prodromal AD: a longitudinal study.内侧颞叶型遗忘综合征可识别前驱期阿尔茨海默病：一项纵向研究。

Neurology. 2007 Nov 6;69(19):1859-67. doi: 10.1212/01.wnl.0000279336.36610.f7.

Research criteria for the diagnosis of Alzheimer's disease: revising the NINCDS-ADRDA criteria.阿尔茨海默病诊断的研究标准：修订NINCDS-ADRDA标准

Lancet Neurol. 2007 Aug;6(8):734-46. doi: 10.1016/S1474-4422(07)70178-3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

痴呆预测中的数据挖掘方法：线性判别分析、逻辑回归、神经网络、支持向量机、分类树和随机森林在准确性、敏感性和特异性方面的实际数据比较。

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献