Luo Yanhong, Li Yongao, Yang Zhenhuan, Zhang Yanbo, Yu Hongmei, Zhao Zhiqiang, Yu Kai, Guo Yujiao, Wang Xueman, Yang Na, Zhang Yan, Zheng Tingting, Zhou Jie
Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, China.
Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, 030001, China.
BMC Cancer. 2024 Dec 5;24(1):1495. doi: 10.1186/s12885-024-13266-7.
Positron emission tomography/computed tomography (PET/CT) is recommended as the standard imaging modality for diffuse large B-cell lymphoma (DLBCL) staging. However, many studies have neglected the role of patients' prognostic factors with respect to imaging PET/CT of quantitative features. In this paper, a multi-view learning (MVL) model is established to make full use of both clinical and imaging data to predict the prognosis of DLBCL patients and thereby assist doctors in decision-making.
Feature engineering, including feature extraction, feature screening by recursive feature elimination, and dimensionality reduction by principal component analysis, are successively performed on the clinical data and imaging data of the research subjects to obtain the study data. After dividing the data into training and test sets, an instance weighting method is applied to the training data. Subsequently, kernel mapping is performed on the imaging features and clinical features separately, and this kernel mapping is processed in the new kernel feature space using kernel canonical correlation analysis (KCCA). Lastly, model training is performed on the obtained common kernel subspace using a support vector machine (SVM). The final overall model, named SVM-2view-KCCA (SVM-2 K), was compared with three other multi-view models (Ensemble-SVM, Multi-view maximum entropy discrimination, and canonical correlation analysis). The performance of the model was evaluated on the test data with respect to several dichotomous metrics: accuracy, sensitivity, F1 score, the area under the curve (AUC), and G-mean.
The SVM model improved AUC by 10.5%, sensitivity by 11.9%, accuracy by 9.8%, F1 score by 9.2%, and G-mean by 7.8% for the DLBCL test data after feature engineering based on dimensionality reduction and instance weighting. In the performance comparison of single-view learning models, the SVM-based integration of clinical and imaging features achieved the best overall performance (AUC = 86.3%, accuracy = 91.6%, sensitivity = 83.2%, F1 = 85.7%, and G-mean = 86.1%). In the comparison of MVL models, SVM-2 K achieved the best overall performance (AUC = 92.1%, accuracy = 96.9%, sensitivity = 90.9%, F1 = 92.8%, and G-mean = 91.4%), and the performance of each MVL model was better than that of the best single-view learning model.
MVL models outperformed single-view learning models. Of the MVL models, the proposed SVM-2 K achieved the best overall performance and could accurately predict patient prognosis.
正电子发射断层扫描/计算机断层扫描(PET/CT)被推荐作为弥漫性大B细胞淋巴瘤(DLBCL)分期的标准成像方式。然而,许多研究忽略了患者预后因素在PET/CT定量特征成像方面的作用。本文建立了一种多视图学习(MVL)模型,以充分利用临床和影像数据来预测DLBCL患者的预后,从而辅助医生进行决策。
对研究对象的临床数据和影像数据依次进行特征工程,包括特征提取、通过递归特征消除进行特征筛选以及通过主成分分析进行降维,以获得研究数据。将数据划分为训练集和测试集后,对训练数据应用实例加权方法。随后,分别对影像特征和临床特征进行核映射,并使用核典型相关分析(KCCA)在新的核特征空间中对该核映射进行处理。最后,使用支持向量机(SVM)对获得的公共核子空间进行模型训练。将最终的整体模型,即SVM - 2视图 - KCCA(SVM - 2K)与其他三个多视图模型(集成支持向量机、多视图最大熵判别和典型相关分析)进行比较。根据几个二分指标在测试数据上评估模型的性能:准确率、灵敏度、F1分数、曲线下面积(AUC)和几何均值(G - mean)。
在基于降维和实例加权的特征工程之后,对于DLBCL测试数据,SVM模型的AUC提高了10.5%,灵敏度提高了11.9%,准确率提高了9.8%,F1分数提高了9.2%,G - mean提高了7.8%。在单视图学习模型的性能比较中,基于SVM的临床和影像特征整合实现了最佳的整体性能(AUC = 86.3%,准确率 = 91.6%,灵敏度 = 83.2%,F1 = 85.7%,G - mean = 86.1%)。在多视图学习模型的比较中,SVM - 2K实现了最佳的整体性能(AUC = 92.1%,准确率 = 96.9%,灵敏度 = 90.9%,F1 = 92.8%,G - mean = 91.4%),并且每个多视图学习模型的性能都优于最佳的单视图学习模型。
多视图学习模型优于单视图学习模型。在多视图学习模型中,所提出的SVM - 2K实现了最佳的整体性能,能够准确预测患者预后。