Lee Ahhyun Lucy, Hwang Eunjin, Hwang Jeongeun
Department of Life Sciences, POSTECH, Pohang, Gyeongsangbukdo, 37673, Republic of Korea; Research Center, Lablup Inc., Seoul, 06161, Republic of Korea; Department of Brain and Cognitive Sciences, Seoul National University, Seoul, 08826, Republic of Korea.
Research Center, Lablup Inc., Seoul, 06161, Republic of Korea.
Comput Biol Med. 2025 Jun;192(Pt B):110248. doi: 10.1016/j.compbiomed.2025.110248. Epub 2025 May 12.
Considering the prevalence of Alzheimer's Disease (AD) among the aging population and the limited means of treatment, early detection emerges as a crucial focus area whereas electroencephalography (EEG) provides a promising diagnostic tool. To date, several studies indicated EEG dataset-based models sporting high diagnostic power in distinguishing patients with AD from healthy controls (HC). However, exploration into which features play a crucial role in the diagnosis remains limited.
This study investigates the diagnostic capabilities of EEG for distinguishing patients with AD from HCs through random forest classification on EEG features. Band power and cross-correlation from the resting state EEG dataset of 22 HCs and 160 patients with AD were calculated using Welch's periodogram and Pearson's correlation, respectively. Welch's t-test was applied to identify features demonstrating significant differences between patients with AD and HCs. Band power and cross-correlation were analyzed using a random forest classifier (RFC) and feature-importance analysis. The importance of feature categories, defined as subsets of features grouped by frequency bands (for band power features) or brain regions (for cross-correlation features), was quantified by calculating their average occurrence across all hyperparameter configurations.
Distinct patterns between the eyes-closed and eyes-open conditions in alpha power were not observed for patients with AD (vs. HC), whereas theta power (4-8 Hz) in all regions was higher in patients with AD (vs. HC)(p<0.05). Interhemispheric cross-correlation in the temporal lobes exhibited the most distinguishable distribution for the cross-correlation dataset. An RFC, exploring 512 models with varied hyperparameters followed by feature-importance analysis based on the mean decrease in impurity, highlighted "theta relative power" and "interhemispheric cross-correlation of channel pairs including temporal channels" as the most important features for distinguishing patients with AD from HCs. RFC on theta-band filtered cross-correlation dataset informed by important features demonstrated the robustness of important features across models with different hyperparameter settings.
The models achieved over 97% accuracy and 100% recall in test sets, although the interpretation of this extraordinarily high accuracy warrants caution due to the small dataset size with high data imbalance and the absence of external validation. This methodology demonstrates the efficacy of EEG-based metrics and machine learning in improving our understanding of EEG characteristics in patients with AD, emphasizing the potential of integrating machine learning techniques into clinical practices.
考虑到阿尔茨海默病(AD)在老年人群中的患病率以及治疗手段的有限性,早期检测成为关键的重点领域,而脑电图(EEG)提供了一种有前景的诊断工具。迄今为止,多项研究表明基于EEG数据集的模型在区分AD患者与健康对照(HC)方面具有较高的诊断能力。然而,对于哪些特征在诊断中起关键作用的探索仍然有限。
本研究通过对EEG特征进行随机森林分类,调查EEG区分AD患者与HC的诊断能力。分别使用韦尔奇周期图和皮尔逊相关性,计算了22名HC和160名AD患者静息态EEG数据集的频段功率和互相关性。应用韦尔奇t检验来识别在AD患者和HC之间表现出显著差异的特征。使用随机森林分类器(RFC)和特征重要性分析对频段功率和互相关性进行分析。通过计算特征类别(按频段分组的特征子集用于频段功率特征,或按脑区分组的特征子集用于互相关性特征)在所有超参数配置中的平均出现次数,对其重要性进行量化。
AD患者(与HC相比)在闭眼和睁眼条件下的α功率之间未观察到明显模式,而AD患者(与HC相比)所有区域的θ功率(4 - 8Hz)更高(p<0.05)。颞叶的半球间互相关性在互相关性数据集中表现出最明显的分布。一个RFC探索了512个具有不同超参数的模型,随后基于杂质平均减少量进行特征重要性分析,突出显示“θ相对功率”和“包括颞叶通道在内的通道对的半球间互相关性”是区分AD患者与HC的最重要特征。基于重要特征的θ带滤波互相关性数据集上的RFC证明了重要特征在不同超参数设置模型中的稳健性。
尽管由于数据集规模小、数据不平衡程度高且缺乏外部验证,对这种异常高的准确性的解释需要谨慎,但模型在测试集中实现了超过97%的准确率和100%的召回率。这种方法证明了基于EEG的指标和机器学习在增进我们对AD患者EEG特征理解方面的有效性,强调了将机器学习技术整合到临床实践中的潜力。