机器学习模型可预测与慢性淋巴细胞白血病相关的异常淋巴细胞增多。
Machine Learning Model Predicts Abnormal Lymphocytosis Associated With Chronic Lymphocytic Leukemia.
作者信息
Aoki Joseph, Khalid Omar, Kaya Cihan, Salama Mohamed E
机构信息
Sonic Healthcare USA, Austin, TX.
出版信息
JCO Clin Cancer Inform. 2025 Jun;9:e2400197. doi: 10.1200/CCI-24-00197. Epub 2025 Jun 24.
PURPOSE
The diagnosis of chronic lymphocytic leukemia (CLL) is often delayed several years in advance of disease. Addressing this care gap would aid in identifying at-risk patients who may benefit from targeted evaluation to prevent adverse outcomes. To our knowledge, to date, however, there are no widely utilized machine learning (ML) models that predict development of CLL. Therefore, the objective of this study was to leverage readily available laboratory data to train and test the performance of ML-based risk models for abnormal lymphocytosis associated with CLL.
METHODS
The observational study population was composed of deidentified laboratory data procured from a large US outpatient network. The 7-year longitudinal data set included 1,090,707 adult patients with the following inclusion criteria: age 50 to 75 years and initial absolute lymphocyte count (ALC) <5 × 10/L. The data set was split into training and held-out test sets, where 80% of the data were used in training and 20% were used for independent testing. ML models were developed using random forest survival methods. The ground truth outcome was abnormal lymphocytosis associated with CLL and monoclonal B-cell lymphocytosis diagnosis: ALC ≥5 × 10/L with ≥40% relative lymphocytosis.
RESULTS
The 12-variable risk classifier model accurately predicted ALC ≥5 × 10/L within 5 years and achieved an area under the curve receiver operating characteristic of 0.92. The most important predictors were ALC (initial, slope), WBC (last, max, slope, initial), platelet (last, slope, max, initial), age, and sex.
CONCLUSION
Our ML risk classifier accurately predicts abnormal lymphocytosis associated with CLL using routine laboratory data. Although prospective studies are warranted, the results support the clinical utility of the model to improve timely recognition for patients at a risk of CLL.
目的
慢性淋巴细胞白血病(CLL)的诊断通常在疾病发生前数年就被延迟。解决这一护理差距将有助于识别可能从靶向评估中受益以预防不良后果的高危患者。然而,据我们所知,迄今为止,尚无广泛应用的预测CLL发生的机器学习(ML)模型。因此,本研究的目的是利用现成的实验室数据来训练和测试基于ML的与CLL相关的异常淋巴细胞增多风险模型的性能。
方法
观察性研究人群由从美国一个大型门诊网络获取的去识别化实验室数据组成。这个7年的纵向数据集包括1,090,707名成年患者,其纳入标准如下:年龄50至75岁,初始绝对淋巴细胞计数(ALC)<5×10⁹/L。数据集被分为训练集和留出测试集,其中80%的数据用于训练,20%用于独立测试。使用随机森林生存方法开发ML模型。真实结果是与CLL相关的异常淋巴细胞增多和单克隆B细胞淋巴细胞增多诊断:ALC≥5×10⁹/L且相对淋巴细胞增多≥40%。
结果
12变量风险分类模型在5年内准确预测了ALC≥5×10⁹/L,曲线下面积(受试者工作特征曲线)为0.92。最重要的预测因素是ALC(初始值、斜率)、白细胞(最后值、最大值、斜率、初始值)、血小板(最后值、斜率、最大值、初始值)、年龄和性别。
结论
我们的ML风险分类器使用常规实验室数据准确预测了与CLL相关的异常淋巴细胞增多。尽管需要进行前瞻性研究,但结果支持该模型在临床上的实用性,以改善对有CLL风险患者的及时识别。