Zuo Z, Meng Q, Cui J, Guo K, Bian H
School of Orthopedics and Traumatology, Henan University of Traditional Chinese Medicine, Department of Rheumatology//Henan Provincial Hospital of Traditional Chinese Medicine, Zhengzhou 450008, China.
Henan Key Laboratory of Zhang Zhongjing Formulae and Herbs for Immunoregulation, Nanyang Institute of Technology, Nanyang 473004, China.
Nan Fang Yi Ke Da Xue Xue Bao. 2024 May 20;44(5):920-929. doi: 10.12122/j.issn.1673-4254.2024.05.14.
To establish a diagnostic model for scleroderma by combining machine learning and artificial neural network based on mitochondria-related genes.
The GSE95065 and GSE59785 datasets of scleroderma from GEO database were used for analyzing expressions of mitochondria-related genes, and the differential genes were identified by Random forest, LASSO regression and SVM algorithms. Based on these differential genes, an artificial neural network model was constructed, and its diagnostic accuracy was evaluated by 10-fold crossover verification and ROC curve analysis using the verification dataset GSE76807. The mRNA expressions of the key genes were verified by RT-qPCR in a mouse model of scleroderma. The CIBERSORT algorithm was used to estimate the bioinformatic association between scleroderma and the screened biomarkers.
A total of 24 differential genes were obtained, including 11 up-regulated and 13 down-regulated genes. Seven most relevant mitochondria-related genes (POLB, GSR, KRAS, NT5DC2, NOX4, IGF1, and TGM2) were screened using 3 machine learning algorithms, and the artificial neural network diagnostic model was constructed. The model showed an area under the ROC curves of 0.984 for scleroderma diagnosis (0.740 for the verification dataset and 0.980 for cross-over validation). RT-qPCR detected significant up-regulation of POLB, GSR, KRAS, NOX4, IGF1 and TGM2 mRNAs and significant down-regulation of NT5DC2 in the mouse models of scleroderma. Immune cell infiltration analysis showed that the differential genes in scleroderma were associated with follicular helper T cells, immature B cells, resting dendritic cells, memory activated CD4T cells, M0 macrophages, monocytes, resting memory CD4T cells and mast cell activation.
The artificial neural network diagnostic model for scleroderma established in this study provides a new perspective for exploring the pathogenesis of scleroderma.
基于线粒体相关基因,结合机器学习和人工神经网络建立硬皮病诊断模型。
利用来自基因表达综合数据库(GEO数据库)的硬皮病GSE95065和GSE59785数据集分析线粒体相关基因的表达,并通过随机森林、套索回归和支持向量机算法鉴定差异基因。基于这些差异基因构建人工神经网络模型,并使用验证数据集GSE76807通过10倍交叉验证和ROC曲线分析评估其诊断准确性。在硬皮病小鼠模型中通过逆转录定量聚合酶链反应(RT-qPCR)验证关键基因的mRNA表达。使用CIBERSORT算法估计硬皮病与筛选出的生物标志物之间的生物信息学关联。
共获得24个差异基因,包括11个上调基因和13个下调基因。使用3种机器学习算法筛选出7个最相关的线粒体相关基因(POLB、GSR、KRAS、NT5DC2、NOX4、IGF1和TGM2),并构建人工神经网络诊断模型。该模型在硬皮病诊断中的ROC曲线下面积为0.984(验证数据集为0.740,交叉验证为0.980)。RT-qPCR检测到硬皮病小鼠模型中POLB、GSR、KRAS、NOX4、IGF1和TGM2的mRNA显著上调,NT5DC2显著下调。免疫细胞浸润分析表明,硬皮病中的差异基因与滤泡辅助性T细胞、未成熟B细胞、静息树突状细胞、记忆性活化CD4 T细胞、M0巨噬细胞、单核细胞、静息记忆性CD4 T细胞和肥大细胞活化有关。
本研究建立的硬皮病人工神经网络诊断模型为探索硬皮病的发病机制提供了新的视角。