School of Software Engineering, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China.
School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand.
Clin Transl Oncol. 2024 Apr;26(4):936-950. doi: 10.1007/s12094-023-03326-y. Epub 2023 Oct 3.
Diffuse large B-cell lymphoma (DLBCL) exhibits remarkable heterogeneity but still remains undiagnosed in identifying the subpopulation of DLBCL to predict the prognosis and guide clinical treatment.
Molecular subgroups were identified in gene expression data from GSE10846 by a consensus clustering algorithm. And gene set enrichment analysis, immune infiltration, and the proposed cell cycle algorithm were applied to explore the biological functions of different subtypes. Meanwhile, univariate and multivariate Cox regression analyses were used to evaluate independent prognostic factors of DLBCL. Finally, the prognostic model, including some key genes screened by Lasso regression, Random Forest algorithm, and point-biserial correlation, was constructed by an optimal classifier from seven machine learning algorithms and validated by another three external datasets (GSE34171, GSE87371, GSE31312).
Comprehensive genomic analysis of 1,143 DLBCL samples identify 2 molecularly, prognostically relevant subtypes: immune-enriched (IME) and cell-cycle-enriched (CCE). Then a new predictive model including seven key genes (SERPING1, TIMP2, NME1, DCTPP1, RFC4, POLE2, and SNRPD1) was developed with high prediction accuracy (88.6%) and strong predictive power (AUC = 0.973) based on the Support Vector Machine (SVM) algorithm in 414 patients from GSE10846. The predictive power was similar in another three testing sets (HR > 1.400, p < 0.05).
This model could evaluate survival independently with strong predictive power compared with other clinical risk factors. Our study constructed a reliable model to predict two new subtypes of DLBCL patients, which could guide the implementation of individualized treatment.
弥漫性大 B 细胞淋巴瘤(DLBCL)表现出显著的异质性,但在确定预测预后和指导临床治疗的 DLBCL 亚群方面仍未得到诊断。
通过共识聚类算法在基因表达数据 GSE10846 中识别分子亚群。并应用基因集富集分析、免疫浸润和提出的细胞周期算法来探索不同亚型的生物学功能。同时,使用单变量和多变量 Cox 回归分析评估 DLBCL 的独立预后因素。最后,通过 7 种机器学习算法中的最优分类器构建包括通过 Lasso 回归、随机森林算法和点双列相关筛选的一些关键基因的预后模型,并通过另外三个外部数据集(GSE34171、GSE87371、GSE31312)进行验证。
对 1143 例 DLBCL 样本进行综合基因组分析,确定了 2 种具有明显预后相关性的分子亚型:免疫富集型(IME)和细胞周期富集型(CCE)。然后,基于支持向量机(SVM)算法,在 414 例来自 GSE10846 的患者中,建立了一个包含 7 个关键基因(SERPING1、TIMP2、NME1、DCTPP1、RFC4、POLE2 和 SNRPD1)的新预测模型,具有较高的预测准确性(88.6%)和强大的预测能力(AUC=0.973)。在另外三个测试集中,该预测能力具有相似性(HR>1.400,p<0.05)。
与其他临床危险因素相比,该模型可以独立评估生存情况,具有较强的预测能力。我们的研究构建了一个可靠的模型,可以预测两种新的 DLBCL 患者亚型,从而指导个体化治疗的实施。