School of Public Health and Management, Wenzhou Medical University, Wenzhou, China.
South Zhejiang Institute of Radiation Medicine and Nuclear Technology, Wenzhou Medical University, Wenzhou, China.
Front Immunol. 2022 Nov 1;13:1025688. doi: 10.3389/fimmu.2022.1025688. eCollection 2022.
Systemic lupus erythematosus (SLE) is a latent, insidious autoimmune disease, and with the development of gene sequencing in recent years, our study aims to develop a gene-based predictive model to explore the identification of SLE at the genetic level. First, gene expression datasets of SLE whole blood samples were collected from the Gene Expression Omnibus (GEO) database. After the datasets were merged, they were divided into training and validation datasets in the ratio of 7:3, where the SLE samples and healthy samples of the training dataset were 334 and 71, respectively, and the SLE samples and healthy samples of the validation dataset were 143 and 30, respectively. The training dataset was used to build the disease risk prediction model, and the validation dataset was used to verify the model identification ability. We first analyzed differentially expressed genes (DEGs) and then used Lasso and random forest (RF) to screen out six key genes (OAS3, USP18, RTP4, SPATS2L, IFI27 and OAS1), which are essential to distinguish SLE from healthy samples. With six key genes incorporated and five iterations of 10-fold cross-validation performed into the RF model, we finally determined the RF model with optimal mtry. The mean values of area under the curve (AUC) and accuracy of the models were over 0.95. The validation dataset was then used to evaluate the AUC performance and our model had an AUC of 0.948. An external validation dataset (GSE99967) with an AUC of 0.810, an accuracy of 0.836, and a sensitivity of 0.921 was used to assess the model's performance. The external validation dataset (GSE185047) of all SLE patients yielded an SLE sensitivity of up to 0.954. The final high-throughput RF model had a mean value of AUC over 0.9, again showing good results. In conclusion, we identified key genetic biomarkers and successfully developed a novel disease risk prediction model for SLE that can be used as a new SLE disease risk prediction aid and contribute to the identification of SLE.
系统性红斑狼疮 (SLE) 是一种潜在的、隐匿的自身免疫性疾病,随着近年来基因测序的发展,我们的研究旨在开发一种基于基因的预测模型,从遗传水平探索 SLE 的识别。首先,从基因表达综合数据库 (GEO) 数据库中收集了 SLE 全血样本的基因表达数据集。合并数据集后,将其按 7:3 的比例分为训练数据集和验证数据集,其中训练数据集的 SLE 样本和健康样本分别为 334 个和 71 个,验证数据集的 SLE 样本和健康样本分别为 143 个和 30 个。训练数据集用于构建疾病风险预测模型,验证数据集用于验证模型的识别能力。我们首先分析差异表达基因 (DEGs),然后使用 Lasso 和随机森林 (RF) 筛选出六个关键基因 (OAS3、USP18、RTP4、SPATS2L、IFI27 和 OAS1),这些基因对区分 SLE 与健康样本至关重要。将六个关键基因纳入并在 RF 模型中进行五次 10 折交叉验证迭代后,我们最终确定了最佳 mtry 的 RF 模型。模型的曲线下面积 (AUC) 和准确性的平均值均超过 0.95。然后使用验证数据集评估 AUC 性能,我们的模型 AUC 为 0.948。使用 AUC 为 0.810、准确性为 0.836 和敏感性为 0.921 的外部验证数据集 (GSE99967) 评估模型性能。使用所有 SLE 患者的外部验证数据集 (GSE185047),SLE 的敏感性高达 0.954。最终的高通量 RF 模型的 AUC 平均值超过 0.9,再次显示出良好的结果。总之,我们鉴定了关键的遗传生物标志物,并成功开发了一种用于 SLE 的新型疾病风险预测模型,可作为 SLE 疾病风险预测的新辅助手段,并有助于 SLE 的识别。