Ye Xianfei, Zhao Xinfeng, Lou Yinyu, Pan Hanqi, Chen Yunying
Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.
Department of Laboratory Medicine, Hangzhou Children's Hospital, Hangzhou, People's Republic of China.
Clin Chem Lab Med. 2025 May 28. doi: 10.1515/cclm-2025-0302.
This study aimed to develop and validate a machine learning (ML) model utilizing cerebrospinal fluid (CSF) body fluid parameters from hematology analyzers to screen for malignant cells.
We analyzed 643 consecutive CSF samples from patients with central nervous system symptoms, with 191 samples classified as positive for malignant cells based on cytological examination, for model derivation. Body fluid parameters were measured using the body fluid mode of a hematology analyzer. Least Absolute Shrinkage and Selection Operator (LASSO) regression was applied to identify predictive biomarkers, followed by performance evaluations of six ML algorithms. Model interpretability was assessed using SHapley Additive exPlanations (SHAP). The selected model was also externally validated with an additional 136 CSF samples.
The median leukocyte (WBC) and total nucleated cell (TNC) counts in the cytology-positive samples were significantly lower than those in the cytology-negative samples (5.4 vs. 31.8 and 7.4 vs. 32.6, respectively, p<0.001). The support vector machine (SVM) model achieved the highest area under the curve (AUC) of 0.899 (SD: 0.035) and the highest sensitivity of 0.827 (SD: 0.059) in internal validation. SHAP analysis identified the percentage of high fluorescence cells and monocytes as the two most significant predictors, both positively correlated with malignant cell outcomes. External validation demonstrated a comparable AUC and sensitivity, confirming the model's generalizability.
We developed an ML model that predicts cytological outcomes in CSF using routinely available body fluid parameters. The model demonstrated consistent performance during external validation.
本研究旨在开发并验证一种利用血液分析仪检测的脑脊液(CSF)体液参数来筛查恶性细胞的机器学习(ML)模型。
我们分析了643例有中枢神经系统症状患者的连续脑脊液样本,其中191例样本根据细胞学检查被分类为恶性细胞阳性,用于模型推导。使用血液分析仪的体液模式测量体液参数。应用最小绝对收缩和选择算子(LASSO)回归来识别预测生物标志物,随后对六种ML算法进行性能评估。使用夏普利加性解释(SHAP)评估模型的可解释性。所选模型还用另外136例脑脊液样本进行了外部验证。
细胞学阳性样本中的白细胞(WBC)和总核细胞(TNC)计数中位数显著低于细胞学阴性样本(分别为5.4对31.8和7.4对32.6,p<0.001)。在内部验证中,支持向量机(SVM)模型的曲线下面积(AUC)最高,为0.899(标准差:0.035),灵敏度最高,为0.827(标准差:0.059)。SHAP分析确定高荧光细胞百分比和单核细胞为两个最显著的预测因子,均与恶性细胞结果呈正相关。外部验证显示AUC和灵敏度相当,证实了模型的可推广性。
我们开发了一种ML模型,该模型使用常规可得的体液参数预测脑脊液中的细胞学结果。该模型在外部验证期间表现出一致的性能。