Bai Fang, Gong Zelong, Cui Dong, Zhang Xiaomei, Hong Wenteng, Gao Yi, Lin Kai, Chen Weijie, Li Lu, Huang Juan, Zheng Biying, Xu Junfa, Xiao Na
Dongguan Key Laboratory of Pathogenesis and Experimental Diagnosis of Infectious Diseases, Institute of Laboratory Medicine of School of Medical Technology, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, Guangdong, China.
Yantian District Center for Disease Control and Prevention (CDC), Shenzhen, Guangdong, China.
Front Pediatr. 2025 Aug 6;13:1608812. doi: 10.3389/fped.2025.1608812. eCollection 2025.
Early aetiological diagnosis is critical for the management of febrile children with infectious illness, as it strongly influences the choice of appropriate medication and can affect a child's complications and outcome. New diagnostic strategies based on host genes have recently been developed and have achieved high accuracy and clinical practicability. In this study, through integrative bioinformatics analysis, we aimed to construct artificial neural network (ANN, multilayer perceptron) and random forest (RF) models based on host gene signatures to diagnose bacterial or viral (B/V) infection in febrile children.
Transcriptome data from the whole blood of children were collected from a public database. Of these, 384 febrile young children (definite bacterial: = 135, definite viral: = 249) were involved in the construction of the RF model. For the generalized RF model, 1,042 patients were included with various aetiological infections, such as , pathogenic , , , , , , and . The overlap of 57 candidate genes between the 117 differentially expressed genes (DEGs) and the 264 module member genes was identified through DEGs analysis and weighted gene co-expression network analysis (WGCNA). Subsequently, L1 regularization algorithms and variable significance analysis (multilayer perceptron) were used to simplify and rank the predictive features, and LCN2 (100.0%), IFI27 (84.4%), SLPI (63.2%), IFIT2 (44.6%) and PI3 (44.5%) were identified as the top predictors. By utilizing the transformed value RefValue (i) of these five genes, the RF model achieved an AUC of 0.9917 in training and 0.9517 in testing for diagnosing B/V infection in children. The ANN model achieved an AUC of 0.9540 in testing. Furthermore, a generalized RF model involving 1,042 patients was developed to predict different aetiological types of samples, achieving an AUC of 0.9421 in training and 0.8968 in testing.
A five-gene host signature (IFIT2, SLPI, IFI27, LCN2, and PI3) was identified and successfully used to construct an RF model that distinguishes B/V infection in febrile children, achieving 85.3% accuracy, 95.1% sensitivity, and 80.0% specificity, and to construct an ANN model that achieves 92.4% accuracy, 86.8% sensitivity, and 95% specificity.
早期病因诊断对于感染性疾病发热儿童的治疗至关重要,因为它对恰当药物的选择有重大影响,且会影响儿童的并发症及预后。基于宿主基因的新诊断策略近期已被开发出来,并已具备高准确性和临床实用性。在本研究中,我们旨在通过整合生物信息学分析,基于宿主基因特征构建人工神经网络(ANN,多层感知器)和随机森林(RF)模型,以诊断发热儿童的细菌或病毒(B/V)感染。
从一个公共数据库收集了儿童全血的转录组数据。其中,384名发热幼儿(确诊细菌感染:=135例,确诊病毒感染:=249例)参与了RF模型的构建。对于广义RF模型,纳入了1042例患有各种病因感染的患者,如,致病性,,,,,,和。通过差异表达基因(DEG)分析和加权基因共表达网络分析(WGCNA),确定了117个差异表达基因(DEG)与264个模块成员基因之间57个候选基因的重叠。随后,使用L1正则化算法和变量显著性分析(多层感知器)来简化和排列预测特征,并确定LCN2(100.0%)、IFI27(84.4%)、SLPI(63.2%)、IFIT2(44.6%)和PI3(44.5%)为顶级预测因子。通过利用这五个基因的转换值RefValue(i),RF模型在诊断儿童B/V感染的训练中AUC为0.9917,测试中为0.9517。ANN模型在测试中的AUC为0.9540。此外,开发了一个涉及1042例患者的广义RF模型来预测不同病因类型的样本,训练中AUC为0.9421,测试中为0.896。
确定并成功利用一个五基因宿主特征(IFIT2、SLPI、IFI27、LCN2和PI3)构建了一个区分发热儿童B/V感染的RF模型,准确率达85.3%,灵敏度达95.1%,特异性达80.0%,并构建了一个准确率达92.4%、灵敏度达86.8%、特异性达95%的ANN模型。