Sun Ping, Wang Xiangwen, Wang Shenghai, Jia Xueyu, Feng Shunkang, Chen Jun, Fang Yiru
Qingdao Mental Health Center, Shandong 266034, China.
Clinical Research Center, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China.
IBRO Neurosci Rep. 2024 Jul 31;17:145-153. doi: 10.1016/j.ibneur.2024.07.007. eCollection 2024 Dec.
To construct a diagnostic model for Bipolar Disorder (BD) depressive phase using peripheral tissue RNA data from patients and combining Random Forest with Feedforward Neural Network methods.
Datasets GSE23848, GSE39653, and GSE69486 were selected, and differential gene expression analysis was conducted using the limma package in R. Key genes from the differentially expressed genes were identified using the Random Forest method. These key genes' expression levels in each sample were used to train a Feedforward Neural Network model. Techniques like L1 regularization, early stopping, and dropout layers were employed to prevent model overfitting. Model performance was then validated, followed by GO, KEGG, and protein-protein interaction network analyses.
The final model was a Feedforward Neural Network with two hidden layers and two dropout layers, comprising 2345 trainable parameters. Model performance on the validation set, assessed through 1000 bootstrap resampling iterations, demonstrated a specificity of 0.769 (95 % CI 0.571-1.000), sensitivity of 0.818 (95 % CI 0.533-1.000), AUC value of 0.832 (95 % CI 0.642-0.979), and accuracy of 0.792 (95 % CI 0.625-0.958). Enrichment analysis of key genes indicated no significant enrichment in any known pathways.
Key genes with biological significance were identified based on the decrease in Gini coefficient within the Random Forest model. The combined use of Random Forest and Feedforward Neural Network to establish a diagnostic model showed good classification performance in Bipolar Disorder.
利用患者外周组织RNA数据,结合随机森林和前馈神经网络方法,构建双相情感障碍(BD)抑郁期的诊断模型。
选择数据集GSE23848、GSE39653和GSE69486,使用R语言中的limma软件包进行差异基因表达分析。采用随机森林方法从差异表达基因中鉴定关键基因。将这些关键基因在每个样本中的表达水平用于训练前馈神经网络模型。采用L1正则化、提前停止和随机失活层等技术防止模型过拟合。然后对模型性能进行验证,随后进行基因本体(GO)、京都基因与基因组百科全书(KEGG)和蛋白质-蛋白质相互作用网络分析。
最终模型是一个具有两个隐藏层和两个随机失活层的前馈神经网络,包含2345个可训练参数。通过1000次自助重采样迭代评估验证集上的模型性能,结果显示特异性为0.769(95%置信区间0.571 - 1.000),敏感性为0.818(95%置信区间0.533 - 1.000),曲线下面积(AUC)值为0.832(95%置信区间0.642 - 0.979),准确率为0.792(95%置信区间0.625 - 0.958)。关键基因的富集分析表明在任何已知通路中均无显著富集。
基于随机森林模型中基尼系数的降低鉴定出具有生物学意义的关键基因。联合使用随机森林和前馈神经网络建立的诊断模型在双相情感障碍中表现出良好的分类性能。