基于概率校准的弥漫性大B细胞淋巴瘤患者复发率预测

Probability calibration-based prediction of recurrence rate in patients with diffuse large B-cell lymphoma.

作者信息

Fan Shuanglong, Zhao Zhiqiang, Zhang Yanbo, Yu Hongmei, Zheng Chuchu, Huang Xueqian, Yang Zhenhuan, Xing Meng, Lu Qing, Luo Yanhong

机构信息

Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China.

Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China.

出版信息

BioData Min. 2021 Aug 13;14(1):38. doi: 10.1186/s13040-021-00272-9.

DOI:10.1186/s13040-021-00272-9

PMID:34389029

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8362168/

Abstract

BACKGROUND

Although many patients receive good prognoses with standard therapy, 30-50% of diffuse large B-cell lymphoma (DLBCL) cases may relapse after treatment. Statistical or computational intelligent models are powerful tools for assessing prognoses; however, many cannot generate accurate risk (probability) estimates. Thus, probability calibration-based versions of traditional machine learning algorithms are developed in this paper to predict the risk of relapse in patients with DLBCL.

METHODS

Five machine learning algorithms were assessed, namely, naïve Bayes (NB), logistic regression (LR), random forest (RF), support vector machine (SVM) and feedforward neural network (FFNN), and three methods were used to develop probability calibration-based versions of each of the above algorithms, namely, Platt scaling (Platt), isotonic regression (IsoReg) and shape-restricted polynomial regression (RPR). Performance comparisons were based on the average results of the stratified hold-out test, which was repeated 500 times. We used the AUC to evaluate the discrimination ability (i.e., classification ability) of the model and assessed the model calibration (i.e., risk prediction accuracy) using the H-L goodness-of-fit test, ECE, MCE and BS.

RESULTS

Sex, stage, IPI, KPS, GCB, CD10 and rituximab were significant factors predicting the 3-year recurrence rate of patients with DLBCL. For the 5 uncalibrated algorithms, the LR (ECE = 8.517, MCE = 20.100, BS = 0.188) and FFNN (ECE = 8.238, MCE = 20.150, BS = 0.184) models were well-calibrated. The errors of the initial risk estimate of the NB (ECE = 15.711, MCE = 34.350, BS = 0.212), RF (ECE = 12.740, MCE = 27.200, BS = 0.201) and SVM (ECE = 9.872, MCE = 23.800, BS = 0.194) models were large. With probability calibration, the biased NB, RF and SVM models were well-corrected. The calibration errors of the LR and FFNN models were not further improved regardless of the probability calibration method. Among the 3 calibration methods, RPR achieved the best calibration for both the RF and SVM models. The power of IsoReg was not obvious for the NB, RF or SVM models.

CONCLUSIONS

Although these algorithms all have good classification ability, several cannot generate accurate risk estimates. Probability calibration is an effective method of improving the accuracy of these poorly calibrated algorithms. Our risk model of DLBCL demonstrates good discrimination and calibration ability and has the potential to help clinicians make optimal therapeutic decisions to achieve precision medicine.

摘要

背景

尽管许多患者通过标准治疗获得了良好的预后，但30%-50%的弥漫性大B细胞淋巴瘤（DLBCL）病例在治疗后可能会复发。统计或计算智能模型是评估预后的有力工具；然而，许多模型无法生成准确的风险（概率）估计。因此，本文开发了基于概率校准的传统机器学习算法版本，以预测DLBCL患者的复发风险。

方法

评估了五种机器学习算法，即朴素贝叶斯（NB）、逻辑回归（LR）、随机森林（RF）、支持向量机（SVM）和前馈神经网络（FFNN），并使用三种方法开发上述每种算法的基于概率校准的版本，即Platt缩放（Platt）、等渗回归（IsoReg）和形状受限多项式回归（RPR）。性能比较基于分层留出法测试的平均结果，该测试重复了500次。我们使用AUC评估模型的辨别能力（即分类能力），并使用H-L拟合优度检验、ECE、MCE和BS评估模型校准（即风险预测准确性）。

结果

性别、分期、国际预后指数（IPI）、 Karnofsky功能状态评分（KPS）、生发中心B细胞（GCB）、CD10和利妥昔单抗是预测DLBCL患者3年复发率的重要因素。对于5种未校准的算法，LR（ECE = 8.517，MCE = 20.100，BS = 0.188）和FFNN（ECE = 8.238，MCE = 20.150，BS = 0.184）模型校准良好。NB（ECE = 15.711，MCE = 34.350，BS = 0.212）、RF（ECE = 12.740，MCE = 27.200，BS = 0.201）和SVM（ECE = 9.872，MCE = 23.800，BS = 0.194）模型的初始风险估计误差较大。通过概率校准，有偏差的NB、RF和SVM模型得到了很好的校正。无论采用何种概率校准方法，LR和FFNN模型的校准误差都没有进一步改善。在3种校准方法中，RPR对RF和SVM模型均实现了最佳校准。IsoReg对NB、RF或SVM模型的作用不明显。

结论

尽管这些算法都具有良好的分类能力，但有几种算法无法生成准确的风险估计。概率校准是提高这些校准不佳算法准确性的有效方法。我们的DLBCL风险模型具有良好的辨别和校准能力，有可能帮助临床医生做出最佳治疗决策，以实现精准医学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fad1/8362168/72a8c5ef5878/13040_2021_272_Fig1_HTML.jpg

相似文献

Probability calibration-based prediction of recurrence rate in patients with diffuse large B-cell lymphoma.

BioData Min. 2021 Aug 13;14(1):38. doi: 10.1186/s13040-021-00272-9.

Applying probability calibration to ensemble methods to predict 2-year mortality in patients with DLBCL.

BMC Med Inform Decis Mak. 2021 Jan 7;21(1):14. doi: 10.1186/s12911-020-01354-0.

Predict DLBCL patients' recurrence within two years with Gaussian mixture model cluster oversampling and multi-kernel learning.

Comput Methods Programs Biomed. 2022 Nov;226:107103. doi: 10.1016/j.cmpb.2022.107103. Epub 2022 Sep 5.

Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Protein-Ligand Predictions.

J Chem Inf Model. 2020 Oct 26;60(10):4546-4559. doi: 10.1021/acs.jcim.0c00476. Epub 2020 Sep 21.

Predicting Prostate Cancer Upgrading of Biopsy Gleason Grade Group at Radical Prostatectomy Using Machine Learning-Assisted Decision-Support Models.

Cancer Manag Res. 2020 Dec 22;12:13099-13110. doi: 10.2147/CMAR.S286167. eCollection 2020.

Predicting 1-Year Mortality after Hip Fracture Surgery: An Evaluation of Multiple Machine Learning Approaches.

J Pers Med. 2021 Jul 27;11(8):727. doi: 10.3390/jpm11080727.

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data.

J Clin Med. 2022 Sep 29;11(19):5772. doi: 10.3390/jcm11195772.

Predicting Prolonged Length of Hospital Stay for Peritoneal Dialysis-Treated Patients Using Stacked Generalization: Model Development and Validation Study.

JMIR Med Inform. 2021 May 19;9(5):e17886. doi: 10.2196/17886.

Classifying 2-year recurrence in patients with dlbcl using clinical variables with imbalanced data and machine learning methods.

Comput Methods Programs Biomed. 2020 Nov;196:105567. doi: 10.1016/j.cmpb.2020.105567. Epub 2020 Jun 9.

A Machine Learning Model for Early Prediction and Detection of Sepsis in Intensive Care Unit Patients.

J Healthc Eng. 2022 Mar 26;2022:9263391. doi: 10.1155/2022/9263391. eCollection 2022.

引用本文的文献

Deep learning-based interpretable prediction of recurrence of diffuse large B-cell lymphoma.

BJC Rep. 2025 May 20;3(1):34. doi: 10.1038/s44276-025-00147-0.

Application of machine learning in the management of lymphoma: Current practice and future prospects.

Digit Health. 2024 Apr 16;10:20552076241247963. doi: 10.1177/20552076241247963. eCollection 2024 Jan-Dec.

Construction and Validation of a Novel Nomogram for Predicting the Recurrence of Diffuse Large B Cell Lymphoma Treated with R-CHOP.

Pharmgenomics Pers Med. 2023 Apr 1;16:291-301. doi: 10.2147/PGPM.S399336. eCollection 2023.

Deep learning methods may not outperform other machine learning methods on analyzing genomic studies.

Front Genet. 2022 Sep 23;13:992070. doi: 10.3389/fgene.2022.992070. eCollection 2022.

本文引用的文献

Calibrating Classification Probabilities with Shape-Restricted Polynomial Regression.

IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1813-1827. doi: 10.1109/TPAMI.2019.2895794. Epub 2019 Jan 28.

GUESS: projecting machine learning scores to well-calibrated probability estimates for clinical decision-making.

Bioinformatics. 2019 Jul 15;35(14):2458-2465. doi: 10.1093/bioinformatics/bty984.

Model for Predicting Breast Cancer Risk in Women With Atypical Hyperplasia.

J Clin Oncol. 2018 Jun 20;36(18):1840-1846. doi: 10.1200/JCO.2017.75.9480. Epub 2018 Apr 20.

Genetics of diffuse large B-cell lymphoma.

Blood. 2018 May 24;131(21):2307-2319. doi: 10.1182/blood-2017-11-764332. Epub 2018 Apr 17.

Prognosis for patients with amyotrophic lateral sclerosis: development and validation of a personalised prediction model.

Lancet Neurol. 2018 May;17(5):423-433. doi: 10.1016/S1474-4422(18)30089-9. Epub 2018 Mar 26.

Risk Prediction Model for Severe Postoperative Complication in Bariatric Surgery.

Obes Surg. 2018 Jul;28(7):1869-1875. doi: 10.1007/s11695-017-3099-2.

Relapse in stage I(E) diffuse large B-cell lymphoma.

Hematol Oncol. 2018 Apr;36(2):416-421. doi: 10.1002/hon.2487. Epub 2017 Oct 30.

Discrimination and Calibration of Clinical Prediction Models: Users' Guides to the Medical Literature.

JAMA. 2017 Oct 10;318(14):1377-1384. doi: 10.1001/jama.2017.12126.

Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach.

BioData Min. 2016 Nov 18;9:36. doi: 10.1186/s13040-016-0114-4. eCollection 2016.

Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.

Proc SIAM Int Conf Data Min. 2015;2015:208-216. doi: 10.1137/1.9781611974010.24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于概率校准的弥漫性大B细胞淋巴瘤患者复发率预测

Probability calibration-based prediction of recurrence rate in patients with diffuse large B-cell lymphoma.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献