Hu Xiefei, Duan Chunmei, Chen Huajian, Li Xun, Jing Qianyu, Ma Qin, Cai Shunli, Fan Haiping, Zhi Shenshen, Li Wei
Department of Clinical Laboratory, Chongqing Emergency Medical Center, School of Medicine, Chongqing University Central Hospital, Chongqing University, Chongqing, China.
Department o f Clinical Laboratory, The Second Affliated Hospital of Army Medical University, Chongqing, China.
BMC Infect Dis. 2025 Sep 1;25(1):1088. doi: 10.1186/s12879-025-11502-4.
Influenza A (IAV) and B (IBV) viruses are the primary etiologic agents driving seasonal influenza epidemics and global pandemics. Early prediction plays a crucial role in epidemic control and reducing mortality rates. Complete blood count (CBC), a widely used clinical tool, provides rapid and non-invasive hematological biomarkers that offer diagnostic value during the pre-pathogen confirmation phase. This study proposes a machine-learning (ML) algorithm leveraging CBC parameters to distinguish IAV and IBV from other infections. This approach may complement nucleic acid tests and antigen assays, enabling timely interventions and reducing diagnostic delays.
This study retrospectively collected CBC data from patients presenting with influenza-like symptoms at Chongqing Emergency Medical Center, Chongqing, China. Patient records meeting inclusion criteria between January 1, 2023, and December 31, 2023, were compiled into a model development dataset, which was subsequently partitioned into training and internal validation subsets at an 8:2 ratio. An independent external validation cohort was collected from January 1, 2024, to February 29, 2024. We employed various machine learning (ML)-based models, using 25 features, to predict the incidence of influenza A and B and calculated the Shapley Additive Explanation (SHAP) values.
The study cohort comprised 3,106 patients (453 influenza-positive cases, 14.6%; 2,653 negative controls, 85.4%). From this population, 2,925 eligible cases were allocated to the model development dataset, stratified into training ( = 2,340) and internal validation ( = 585) subsets through an 8:2 split. An independent external validation cohort containing 181 patients was collected. In the external validation, the ensemble model using voting with adaptive boosting (ADB) and the Extreme Gradient Boosting (XGB) achieved an area under the receiver operating characteristics curve (AUROC) of 0.810. SHAP analysis identified the top five hematologic parameters with dominant predictive influence in the RF model: MON%, LYM, WBC, RBC, and NEU/MON.
This analysis establishes RF and ADB-XGB model as the optimal CBC-based machine learning framework for discriminating influenza A and B infections. The model’s operational simplicity enables rapid triage implementation in resource-constrained emergency departments, particularly valuable when molecular confirmation (RT-PCR) is unavailable.
The online version contains supplementary material available at 10.1186/s12879-025-11502-4.
甲型流感病毒(IAV)和乙型流感病毒(IBV)是引发季节性流感流行和全球大流行的主要病原体。早期预测在疫情控制和降低死亡率方面起着至关重要的作用。全血细胞计数(CBC)作为一种广泛使用的临床工具,可提供快速且非侵入性的血液学生物标志物,在病原体确认前期具有诊断价值。本研究提出一种利用全血细胞计数参数的机器学习(ML)算法,以区分IAV和IBV与其他感染。这种方法可补充核酸检测和抗原检测,实现及时干预并减少诊断延误。
本研究回顾性收集了中国重庆急救医疗中心出现流感样症状患者的全血细胞计数数据。将2023年1月1日至2023年12月31日符合纳入标准的患者记录汇编成一个模型开发数据集,随后以8:2的比例将其划分为训练集和内部验证子集。2024年1月1日至2024年2月29日收集了一个独立的外部验证队列。我们使用各种基于机器学习(ML)的模型,利用25个特征来预测甲型和乙型流感的发病率,并计算了夏普利值(SHAP)。
研究队列包括3106名患者(453例流感阳性病例,占14.6%;2653例阴性对照,占85.4%)。从该人群中,2925例符合条件的病例被分配到模型开发数据集,通过8:2的划分分层为训练集(n = 2340)和内部验证集(n = 585)。收集了一个包含181名患者的独立外部验证队列。在外部验证中,使用自适应提升投票(ADB)和极端梯度提升(XGB)的集成模型在受试者工作特征曲线下面积(AUROC)达到0.810。SHAP分析确定了随机森林(RF)模型中具有主要预测影响的前五个血液学参数:单核细胞百分比(MON%)、淋巴细胞(LYM)、白细胞(WBC)、红细胞(RBC)和中性粒细胞/单核细胞(NEU/MON)。
本分析确定了随机森林(RF)和ADB - XGB模型是基于全血细胞计数的区分甲型和乙型流感感染的最佳机器学习框架。该模型操作简单,能够在资源有限的急诊科快速实施分诊,在无法进行分子确认(RT - PCR)时尤其有价值。
在线版本包含可在10.1186/s12879-025-11502-4获取的补充材料。