Division of Nephrology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan.
Department of Internal Medicine, Tri-Service General Hospital Songshan Branch, National Defense Medical Center, Taipei, Taiwan.
J Transl Med. 2023 Feb 3;21(1):76. doi: 10.1186/s12967-023-03931-z.
Identifying candidates responsive to treatment is important in lupus nephritis (LN) at the renal flare (RF) because an effective treatment can lower the risk of progression to end-stage kidney disease. However, machine learning (ML)-based models that address this issue are lacking.
Transcriptomic profiles based on DNA microarray data were extracted from the GSE32591 and GSE112943 datasets. Comprehensive bioinformatics analyses were performed to identify disease-defining genes (DDGs). Peripheral blood samples (GSE81622, GSE99967, and GSE72326) were used to evaluate the effect of DDGs. Single-sample gene set enrichment analysis (ssGSEA) scores of the DDGs were calculated and correlated with specific immunology genes listed in the nCounter panel. GSE60681 and GSE69438 were used to examine the ability of the DDGs to discriminate LN from other renal diseases. K-means clustering was used to obtain the separate gene sets. The clustering results were extended to data derived using the nCounter technique. The least absolute shrinkage and selection operator (LASSO) algorithm was used to identify genes with high predictive value for treatment response after the first RF in each cluster. LASSO models with tenfold validation were built in GSE200306 and assessed by receiver operating characteristic (ROC) analysis with area under curve (AUC). The models were validated by using an independent dataset (GSE113342).
Forty-five hub genes specific to LN were identified. Eight optimal disease-defining clusters (DDCs) were identified in this study. Th1 and Th2 cell differentiation pathway was significantly enriched in DDC-6. LCK in DDC-6, whose expression positively correlated with various subsets of T cell infiltrations, was found to be differentially expressed between responders and non-responders and was ranked high in regulatory network analysis. Based on DDC-6, the prediction model had the best performance (AUC: 0.75; 95% confidence interval: 0.44-1 in the testing set) and high precision (0.83), recall (0.71), and F1 score (0.77) in the validation dataset.
Our study demonstrates that incorporating knowledge of biological phenotypes into the ML model is feasible for evaluating treatment response after the first RF in LN. This knowledge-based incorporation improves the model's transparency and performance. In addition, LCK may serve as a biomarker for T-cell infiltration and a therapeutic target in LN.
在狼疮肾炎 (LN) 肾发作 (RF) 时,识别对治疗有反应的候选者很重要,因为有效的治疗可以降低进展为终末期肾病的风险。然而,针对这个问题的基于机器学习 (ML) 的模型却很缺乏。
从 GSE32591 和 GSE112943 数据集提取基于 DNA 微阵列数据的转录组谱。进行全面的生物信息学分析以识别疾病定义基因 (DDG)。使用外周血样本 (GSE81622、GSE99967 和 GSE72326) 来评估 DDG 的效果。计算 DDG 的单样本基因集富集分析 (ssGSEA) 评分,并与 nCounter 面板中列出的特定免疫学基因相关联。使用 GSE60681 和 GSE69438 来检查 DDG 区分 LN 与其他肾脏疾病的能力。使用 K-均值聚类获得单独的基因集。将聚类结果扩展到使用 nCounter 技术获得的数据。使用最小绝对收缩和选择算子 (LASSO) 算法识别每个聚类中首次 RF 后对治疗反应具有高预测价值的基因。在 GSE200306 中构建了具有十倍验证的 LASSO 模型,并通过接收者操作特征 (ROC) 分析和曲线下面积 (AUC) 进行评估。使用独立数据集 (GSE113342) 对模型进行验证。
鉴定出 45 个特定于 LN 的枢纽基因。本研究中鉴定出 8 个最佳疾病定义聚类 (DDC)。Th1 和 Th2 细胞分化途径在 DDC-6 中显著富集。在 DDC-6 中,LCK 的表达与各种 T 细胞浸润亚群呈正相关,在应答者和非应答者之间存在差异表达,并且在调控网络分析中排名较高。基于 DDC-6,预测模型在测试集中具有最佳性能 (AUC:0.75;95%置信区间:0.44-1),并且在验证数据集中具有较高的精度 (0.83)、召回率 (0.71) 和 F1 分数 (0.77)。
我们的研究表明,将生物学表型知识纳入 ML 模型中评估 LN 首次 RF 后的治疗反应是可行的。这种基于知识的纳入提高了模型的透明度和性能。此外,LCK 可能作为 T 细胞浸润的生物标志物和 LN 的治疗靶点。