Cui Xu, Ji Houlin, Guo Shengyang, Liu Ju, Zhang Linyuan, Jia Yongwei, Cui Yin, Zhou Xiaoxiao
Department of Orthopedics, Shanghai University of Medicine and Health Sciences Affiliated Zhoupu Hospital, Shanghai, China.
Jinji Lake Community Health Service Center of Suzhou Industrial Park, Suzhou, China.
Front Genet. 2025 Aug 1;16:1595676. doi: 10.3389/fgene.2025.1595676. eCollection 2025.
To construct a diagnostic model of osteoarthritis related to methylation genes using machine learning algorithms, and analyze its prognostic value and biological functions.
The GSE 63695 and GSE162484 datasets including human osteoarthritis (OA) and normal samples were downloaded from the GEO datasets. The microarray chip data of chondrocytes were analyzed using R software to obtain differentially methylated genes. Genes were selected through SVM-RFE analysis and LASSO regression model, and a diagnostic model for OA was established. The performance of the model was assessed by the receiver operating characteristic (ROC) curve. The gene set enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) was carried out on the genes incorporated within the model.
An overall 11 DEGs were identified:7 genes were remarkably upregulated and 4 genes were distinctly downregulated. By means of machine learning algorithms, ARHGEF10, ATP11A, NOTCH1, THSD4, NIPA1, SIM2, MAN1C1, ENDOG, CCNC, TAF5, and VPS52 were ultimately incorporated into the model, which could effectively diagnose OA. The area under the curve (AUC) in the datasets GSE 63695 and GSE162484 was 0.96 and 0.93 respectively.
The diagnostic model of methylation-related genes constructed based on machine learning algorithms can effectively identify OA.
利用机器学习算法构建与甲基化基因相关的骨关节炎诊断模型,并分析其预后价值和生物学功能。
从基因表达综合数据库(GEO)下载包含人类骨关节炎(OA)和正常样本的GSE 63695和GSE162484数据集。使用R软件分析软骨细胞的微阵列芯片数据,以获得差异甲基化基因。通过支持向量机-递归特征消除(SVM-RFE)分析和套索(LASSO)回归模型选择基因,建立OA诊断模型。通过受试者工作特征(ROC)曲线评估模型的性能。对模型中纳入的基因进行基因本体论(GO)和京都基因与基因组百科全书(KEGG)的基因集富集分析。
共鉴定出11个差异表达基因(DEG):7个基因显著上调,4个基因明显下调。通过机器学习算法,最终将ARHGEF10、ATP11A、NOTCH1、THSD4、NIPA1、SIM2、MANIC1、ENDOG、CCNC、TAF5和VPS52纳入模型,该模型可有效诊断OA。在数据集GSE 63695和GSE162484中的曲线下面积(AUC)分别为0.96和0.93。
基于机器学习算法构建的甲基化相关基因诊断模型可有效识别OA。