Li Jinjie, Hao Xiaoyan, Xin Yijuan, Li Rui, Zhu Lin, Cheng Xiaoli, Yang Liu, Liu Jiayun
Department of Clinical Laboratory Medicine, Xijing Hospital, Air Force Medical University, Xi'an 710032, China.
Department of Clinical Laboratory Medicine, Xijing Hospital, Air Force Medical University, Xi'an 710032, China. *Corresponding authors, E-mail:
Xi Bao Yu Fen Zi Mian Yi Xue Za Zhi. 2025 Apr;41(4):339-347.
Objective To mine and analyze the routine blood test data of children with allergic rhinitis (AR), identify routine blood parameters related to childhood allergic rhinitis, establish an effective diagnostic model, and evaluate the performance of the model. Methods This study was a retrospective study of clinical cases. The experimental group comprised a total of 1110 children diagnosed with AR at the First Affiliated Hospital of Air Force Medical University during the period from December 12, 2020 to December 12, 2021, while the control group included 1109 children without a history of allergic rhinitis or other allergic diseases who underwent routine physical examinations during the same period. Information such as age, sex and routine blood test results was collected for all subjects. The levels of routine blood test indicators were compared between AR children and healthy children using comprehensive intelligent baseline analysis, with indicators of P≥0.05 excluded; variables were screened by Lasso regression. Binary Logistic regression was used to further evaluate the influence of multiple routine blood indexes on the results. Five kinds of machine model algorithms were used, namely extreme value gradient lift (XGBoost), logistic regression (LR), gradient lift decision tree (LGBMC), Random forest (RF) and adaptive lift algorithm (AdaBoost), to establish the diagnostic models. The receiver operating characteristic (ROC) curve was used to screen the optimal model. The best LightGBM algorithm was used to build an online patient risk assessment tool for clinical application. Results Statistically significant differences were observed between the AR group and the control group in the following routine blood test indicators: mean cellular hemoglobin concentration (MCHC), hemoglobin (HGB), absolute value of basophils (BASO), absolute value of eosinophils (EOS), large platelet ratio (P-LCR), mean platelet volume (MPV), platelet distribution width (PDW), platelet count (PLT), absolute values of leukocyte neutrophil (W-LCC), leukocyte monocyte (W-MCC), leukocyte lymphocyte (W-SCC), and age. Lasso regression identified these variables as important predictors, and binary Logistic regression further analyzed the significant influence of these variables on the results. The optimal machine learning algorithm LightGBM was used to establish a multi-index joint detection model. The model showed robust prediction performance in the training set, with AUC values of 0.8512 and 0.8103 in the internal validation set. Conclusion The identified routine blood parameters can be used as potential biomarkers for early diagnosis and risk assessment of AR, which can improve the accuracy and efficiency of diagnosis. The established model provides scientific basis for more accurate diagnostic tools and personalized prevention strategies. Future studies should prospectively validate these findings and explore their applicability in other related diseases.
目的 挖掘并分析变应性鼻炎(AR)患儿的血常规检验数据,识别与儿童变应性鼻炎相关的血常规参数,建立有效的诊断模型,并评估该模型的性能。方法 本研究为临床病例回顾性研究。实验组共有1110例于2020年12月12日至2021年12月12日期间在空军军医大学第一附属医院被诊断为AR的患儿,而对照组包括1109例同期接受常规体检且无变应性鼻炎病史或其他过敏性疾病的患儿。收集所有受试者的年龄、性别及血常规检验结果等信息。采用综合智能基线分析比较AR患儿与健康儿童的血常规检验指标水平,排除P≥0.05的指标;通过Lasso回归进行变量筛选。采用二元Logistic回归进一步评估多个血常规指标对结果的影响。使用5种机器学习模型算法,即极限梯度提升(XGBoost)、逻辑回归(LR)、梯度提升决策树(LGBMC)、随机森林(RF)和自适应提升算法(AdaBoost),建立诊断模型。采用受试者操作特征(ROC)曲线筛选最优模型。采用最佳的LightGBM算法构建在线患者风险评估工具以供临床应用。结果 AR组与对照组在以下血常规检验指标上存在统计学显著差异:平均红细胞血红蛋白浓度(MCHC)、血红蛋白(HGB)、嗜碱性粒细胞绝对值(BASO)、嗜酸性粒细胞绝对值(EOS)、大血小板比率(P-LCR)、平均血小板体积(MPV)、血小板分布宽度(PDW)、血小板计数(PLT)、白细胞中性粒细胞绝对值(W-LCC)、白细胞单核细胞绝对值(W-MCC)、白细胞淋巴细胞绝对值(W-SCC)以及年龄。Lasso回归将这些变量识别为重要预测因子,二元Logistic回归进一步分析了这些变量对结果的显著影响。采用最优的机器学习算法LightGBM建立多指标联合检测模型。该模型在训练集中表现出稳健的预测性能,在内部验证集中的AUC值分别为0.8512和0.8103。结论 所识别的血常规参数可作为AR早期诊断和风险评估的潜在生物标志物,可提高诊断的准确性和效率。所建立模型为更准确的诊断工具和个性化预防策略提供了科学依据。未来研究应前瞻性地验证这些发现,并探索其在其他相关疾病中的适用性。