Suppr超能文献

基于概率校准的弥漫性大B细胞淋巴瘤患者复发率预测

Probability calibration-based prediction of recurrence rate in patients with diffuse large B-cell lymphoma.

作者信息

Fan Shuanglong, Zhao Zhiqiang, Zhang Yanbo, Yu Hongmei, Zheng Chuchu, Huang Xueqian, Yang Zhenhuan, Xing Meng, Lu Qing, Luo Yanhong

机构信息

Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China.

Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China.

出版信息

BioData Min. 2021 Aug 13;14(1):38. doi: 10.1186/s13040-021-00272-9.

Abstract

BACKGROUND

Although many patients receive good prognoses with standard therapy, 30-50% of diffuse large B-cell lymphoma (DLBCL) cases may relapse after treatment. Statistical or computational intelligent models are powerful tools for assessing prognoses; however, many cannot generate accurate risk (probability) estimates. Thus, probability calibration-based versions of traditional machine learning algorithms are developed in this paper to predict the risk of relapse in patients with DLBCL.

METHODS

Five machine learning algorithms were assessed, namely, naïve Bayes (NB), logistic regression (LR), random forest (RF), support vector machine (SVM) and feedforward neural network (FFNN), and three methods were used to develop probability calibration-based versions of each of the above algorithms, namely, Platt scaling (Platt), isotonic regression (IsoReg) and shape-restricted polynomial regression (RPR). Performance comparisons were based on the average results of the stratified hold-out test, which was repeated 500 times. We used the AUC to evaluate the discrimination ability (i.e., classification ability) of the model and assessed the model calibration (i.e., risk prediction accuracy) using the H-L goodness-of-fit test, ECE, MCE and BS.

RESULTS

Sex, stage, IPI, KPS, GCB, CD10 and rituximab were significant factors predicting the 3-year recurrence rate of patients with DLBCL. For the 5 uncalibrated algorithms, the LR (ECE = 8.517, MCE = 20.100, BS = 0.188) and FFNN (ECE = 8.238, MCE = 20.150, BS = 0.184) models were well-calibrated. The errors of the initial risk estimate of the NB (ECE = 15.711, MCE = 34.350, BS = 0.212), RF (ECE = 12.740, MCE = 27.200, BS = 0.201) and SVM (ECE = 9.872, MCE = 23.800, BS = 0.194) models were large. With probability calibration, the biased NB, RF and SVM models were well-corrected. The calibration errors of the LR and FFNN models were not further improved regardless of the probability calibration method. Among the 3 calibration methods, RPR achieved the best calibration for both the RF and SVM models. The power of IsoReg was not obvious for the NB, RF or SVM models.

CONCLUSIONS

Although these algorithms all have good classification ability, several cannot generate accurate risk estimates. Probability calibration is an effective method of improving the accuracy of these poorly calibrated algorithms. Our risk model of DLBCL demonstrates good discrimination and calibration ability and has the potential to help clinicians make optimal therapeutic decisions to achieve precision medicine.

摘要

背景

尽管许多患者通过标准治疗获得了良好的预后,但30%-50%的弥漫性大B细胞淋巴瘤(DLBCL)病例在治疗后可能会复发。统计或计算智能模型是评估预后的有力工具;然而,许多模型无法生成准确的风险(概率)估计。因此,本文开发了基于概率校准的传统机器学习算法版本,以预测DLBCL患者的复发风险。

方法

评估了五种机器学习算法,即朴素贝叶斯(NB)、逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)和前馈神经网络(FFNN),并使用三种方法开发上述每种算法的基于概率校准的版本,即Platt缩放(Platt)、等渗回归(IsoReg)和形状受限多项式回归(RPR)。性能比较基于分层留出法测试的平均结果,该测试重复了500次。我们使用AUC评估模型的辨别能力(即分类能力),并使用H-L拟合优度检验、ECE、MCE和BS评估模型校准(即风险预测准确性)。

结果

性别、分期、国际预后指数(IPI)、 Karnofsky功能状态评分(KPS)、生发中心B细胞(GCB)、CD10和利妥昔单抗是预测DLBCL患者3年复发率的重要因素。对于5种未校准的算法,LR(ECE = 8.517,MCE = 20.100,BS = 0.188)和FFNN(ECE = 8.238,MCE = 20.150,BS = 0.184)模型校准良好。NB(ECE = 15.711,MCE = 34.350,BS = 0.212)、RF(ECE = 12.740,MCE = 27.200,BS = 0.201)和SVM(ECE = 9.872,MCE = 23.800,BS = 0.194)模型的初始风险估计误差较大。通过概率校准,有偏差的NB、RF和SVM模型得到了很好的校正。无论采用何种概率校准方法,LR和FFNN模型的校准误差都没有进一步改善。在3种校准方法中,RPR对RF和SVM模型均实现了最佳校准。IsoReg对NB、RF或SVM模型的作用不明显。

结论

尽管这些算法都具有良好的分类能力,但有几种算法无法生成准确的风险估计。概率校准是提高这些校准不佳算法准确性的有效方法。我们的DLBCL风险模型具有良好的辨别和校准能力,有可能帮助临床医生做出最佳治疗决策,以实现精准医学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fad1/8362168/72a8c5ef5878/13040_2021_272_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验