Chapuy Björn, Wood Timothy, Stewart Chip, Dunford Andrew, Wienand Kirsty, Khan Sumbul Jawed, Serin Nazli, Wang Meng, Calabretta Eleonora, Shimono Joji, Van Seters Samantha, Wisemann Sam, Belkin Saveliy, Heimann David, Redd Robert, Shipp Margaret A, Getz Gad
Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA.
Harvard Medical School, Boston, MA.
Blood. 2025 May 1;145(18):2041-2055. doi: 10.1182/blood.2024025652.
Diffuse large B-cell lymphoma (DLBCL) is a clinically and molecularly heterogeneous disease. The increasing recognition and targeting of genetically defined DLBCLs highlight the need for robust classification algorithms. We previously characterized recurrent genetic alterations in DLBCL and identified 5 discrete subtypes, clusters 1 to 5 (C1-C5), with unique mechanisms of transformation, immune evasion, candidate treatment targets, and different outcomes after standard first-line therapy. Herein, we validate the C1 to C5 DLBCL taxonomy in an independent data set and use the expanded series of 699 primary DLBCLs to develop a probabilistic molecular classifier and confirm its performance in an independent test set. Using our previously assigned cluster labels as a reference, we systematically compared multiple machine learning models and strategies for input feature dimensionality reduction with a newly developed performance metric that captured the relationship between accuracy and confidence of class assignments. The winning neural network model, DLBclass, assigned all cases in the training/validation and independent test sets with 91% and 89% accuracies, respectively. In the 75% of cases with confidence >0.7, DLBclass assignments were accurate in 97% of the training/validation set and 98% of the test set. DLBclass enables robust prospective classification of single cases for inclusion in genetically guided clinical trials or practice and represents a framework for the development of genomics-based classification methods in other cancers.
弥漫性大B细胞淋巴瘤(DLBCL)是一种临床和分子层面均具有异质性的疾病。对基因定义的DLBCL的认识不断增加以及针对性治疗的出现,凸显了对强大分类算法的需求。我们之前对DLBCL中反复出现的基因改变进行了特征分析,并确定了5种离散亚型,即聚类1至5(C1 - C5),它们具有独特的转化机制、免疫逃逸机制、候选治疗靶点以及标准一线治疗后的不同预后。在此,我们在一个独立数据集中验证了C1至C5 DLBCL分类法,并使用699例原发性DLBCL的扩展系列来开发一种概率分子分类器,并在一个独立测试集中确认其性能。以我们之前指定的聚类标签作为参考,我们系统地比较了多种机器学习模型以及用于输入特征降维的策略,并使用一种新开发的性能指标来捕捉分类准确性与置信度之间的关系。获胜的神经网络模型DLBclass在训练/验证集和独立测试集中对所有病例的分类准确率分别为91%和89%。在75%置信度>0.7的病例中,DLBclass在训练/验证集的97%和测试集的98%的病例中分类准确。DLBclass能够对单个病例进行可靠的前瞻性分类,以便纳入基因指导的临床试验或实践中,并且代表了在其他癌症中开发基于基因组学的分类方法的框架。