Suppr超能文献

仅使用廉价检测指标进行慢性淋巴细胞白血病治疗预测的可解释机器学习

Explainable machine learning for chronic lymphocytic leukemia treatment prediction using only inexpensive tests.

机构信息

Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, P.O.B. 653, Be'er Sheva, 8410501, Israel.

Internal Medicine C, Bnai Zion Medical Center, Haifa, Israel.

出版信息

Comput Biol Med. 2022 Jun;145:105490. doi: 10.1016/j.compbiomed.2022.105490. Epub 2022 Apr 6.

Abstract

BACKGROUND

Chronic lymphocytic leukemia (CLL) is one of the most common types of leukemia in the western world which affects mainly the elderly population. Progress of the disease is very heterogeneous both in terms of necessity of treatment and life expectancy. The current scoring system for prognostic evaluation of patients with CLL is called CLL-IPI and predicts the general progress of the disease but is not a measure or a decision aid for the necessity of treatment. Due to the heterogeneous behavior of CLL it is important to develop tools that will identify if and when patients will necessitate treatment for CLL. Recently, Machine Learning (ML) has spread to many public health fields including diagnosis and prognosis of diseases.

OBJECTIVE

Existing machine learning methods for CLL treatment prediction rely on expensive tests, such as genetic tests, rendering them useless in peripheral or low-resource clinics such as those in developing countries. We aim to develop a model for predicting whether a patient will need treatment for CLL within two years of diagnosis using a machine learning model based on only on demographic data and routine laboratory tests.

METHOD

We conducted a single center study that included adult patients (above the age of 18) that were diagnosed with CLL according to the IWCLL criteria and were under observation at the hematology unit of the Bnai-Zion medical center between 2009 and 2019. Patient data include demographic, clinical and laboratory measures that were extracted from patients' medical records anonymously. All laboratory results, during the observation period, were extracted for the entire cohort. Multiple ML approaches for classifying whether a patient will require treatment during a predetermined period of 2 years were evaluated. Performance of the ML models was measured using repeated cross validation. We evaluated the use of SHapley Additive exPlanation (SHAP) for explaining what influences the models decision. Additionally, we employ a method for extracting a single decision tree from the ML model which enables the doctor to understand the main logic governing the model prediction.

RESULTS

The study included 109 patients of them 67 males (61%). Patients were under observation for a median of 44 months and the median age was 65 (age range: 45-87). 64% of the cohort received therapy during follow-up. A Gradient Boosting Model (GBM) model using all of the extracted variables to identify the need for treatment in the coming two years among patients with CLL achieved the AUPRC of 0.78 (±0.08). An identical GBM model, without genetic/FISH and flowcytometry (FACS) data, such that it can be used in peripheral clinics, scored an AUPRC of 0.7686 (±0.0837). A Generalized Linear Model (GLM) using the same features, scored an AUPRC of 0.7535 (±0.0995). All the models described above surpassed the performance of CLL-IPI that was evaluated using the CLL-TIM model. According to the SHAP results, red blood cell (RBC) count was the most predictive value for the necessity for treatment, where a high value is associated with a low probability of requiring treatment in the coming two years. Additionally, the SHAP method was used for estimating the personal risk of a random patient and showed sensible results. A simple Decision Tree classifier showed that patients who had a hemoglobin level of less than 13 gm/dL and a Neutrophil to Lymphocyte Ratio (NLR) less than 0.063, which constituted 34% percent of the patients included in our study, had a high probability (76%) of requiring treatment.

CONCLUSIONS

Machine Learning algorithms that were evaluated in this work for predicting the necessity of treatment for patients with CLL achieved reasonable accuracy which surpassed that of CLL-IPI which was evaluated using the CLL-TIM model. Furthermore, we found that a machine learning model trained exclusively using inexpensive features only incurred a modest decrease in performance compared to the model trained using all of the features. Due to the small number of patients in this study it is necessary to validate the results on a larger population.

摘要

背景

慢性淋巴细胞白血病(CLL)是西方世界最常见的白血病类型之一,主要影响老年人群。疾病的进展在治疗的必要性和预期寿命方面都非常不同。目前用于评估 CLL 患者预后的评分系统称为 CLL-IPI,可预测疾病的总体进展,但不能作为治疗必要性的衡量标准或决策辅助工具。由于 CLL 的表现形式多种多样,因此开发能够确定患者何时需要治疗 CLL 的工具非常重要。最近,机器学习(ML)已应用于许多公共卫生领域,包括疾病的诊断和预后。

目的

现有的用于预测 CLL 治疗的机器学习方法依赖于昂贵的测试,如基因测试,因此在发展中国家等外围或资源有限的诊所中无法使用。我们旨在开发一种基于仅基于人口统计学数据和常规实验室测试的模型,用于预测患者在诊断后两年内是否需要治疗 CLL。

方法

我们进行了一项单中心研究,纳入了根据 IWCLL 标准诊断为 CLL 且在 2009 年至 2019 年期间在 Bnai-Zion 医学中心血液科观察的成年患者(年龄大于 18 岁)。患者数据包括从患者病历中匿名提取的人口统计学、临床和实验室测量值。在观察期间,提取了整个队列的所有实验室结果。评估了多种用于分类患者在预定的 2 年内是否需要治疗的 ML 方法。使用重复交叉验证来衡量 ML 模型的性能。我们评估了使用 SHapley Additive exPlanation (SHAP) 来解释模型决策的影响。此外,我们采用了从 ML 模型中提取单个决策树的方法,使医生能够理解主导模型预测的主要逻辑。

结果

该研究纳入了 109 名患者,其中 67 名男性(61%)。患者的中位观察期为 44 个月,中位年龄为 65 岁(年龄范围:45-87 岁)。64%的患者在随访期间接受了治疗。使用所有提取变量的梯度提升模型(GBM)模型来识别 CLL 患者在未来两年内的治疗需求,其 AUC-PRC 为 0.78(±0.08)。一个没有遗传/FISH 和流式细胞术(FACS)数据的相同 GBM 模型,使其可以在周边诊所使用,其 AUC-PRC 评分为 0.7686(±0.0837)。使用相同特征的广义线性模型(GLM),其 AUC-PRC 评分为 0.7535(±0.0995)。上述所有模型的表现均优于使用 CLL-TIM 模型评估的 CLL-IPI。根据 SHAP 结果,红细胞(RBC)计数是预测治疗必要性的最具预测性的值,高值与未来两年内需要治疗的可能性较低相关。此外,还使用 SHAP 方法来估计随机患者的个人风险,并显示出合理的结果。简单的决策树分类器显示,血红蛋白水平低于 13 g/dL 和中性粒细胞与淋巴细胞比值(NLR)低于 0.063 的患者(占本研究纳入患者的 34%)有很高的可能性(76%)需要治疗。

结论

在这项工作中评估的用于预测 CLL 患者治疗必要性的机器学习算法达到了合理的准确性,超过了使用 CLL-TIM 模型评估的 CLL-IPI。此外,我们发现仅使用廉价特征训练的机器学习模型与使用所有特征训练的模型相比,性能略有下降。由于本研究中患者人数较少,因此有必要在更大的人群中验证结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验