Xue Mei, Wang Qiong, Zhang Yicheng, Pang Bo, Yang Min, Deng Xiangling, Zhang Zhixin, Niu Wenquan
Graduate School, Beijing University of Chinese Medicine, Beijing, China.
Department of Pediatrics, China-Japan Friendship Hospital, Beijing, China.
Front Pediatr. 2022 Jun 16;10:911591. doi: 10.3389/fped.2022.911591. eCollection 2022.
We employed machine-learning methods to explore data from a large survey on students, with the goal of identifying and validating a thrifty panel of important factors associated with lower respiratory tract infection (LRTI).
Cross-sectional cluster sampling was performed for a survey of students aged 6-14 years who attended primary or junior high school in Beijing within January, 2022. Data were collected electronic questionnaires. Statistical analyses were completed using the PyCharm (Edition 2018.1 x64) and Python (Version 3.7.6).
Data from 11,308 students (5,527 girls and 5,781 boys) were analyzed, and 909 of them had LRTI with the prevalence of 8.01%. After a comprehensive evaluation, the Gaussian naive Bayes (gNB) algorithm outperformed the other machine-learning algorithms. The gNB algorithm had accuracy of 0.856, precision of 0.140, recall of 0.165, F1 score of 0.151, and area under the receiver operating characteristic curve (AUROC) of 0.652. Using the optimal gNB algorithm, top five important factors, including age, rhinitis, sitting time, dental caries, and food or drug allergy, had decent prediction performance. In addition, the top five factors had prediction performance comparable to all factors modeled. For example, under the sequential deep-learning model, the accuracy and loss were separately gauged at 92.26 and 25.62% when incorporating the top five factors, and 92.22 and 25.52% when incorporating all factors.
Our findings showed the top five important factors modeled by gNB algorithm can sufficiently represent all involved factors in predicting LRTI risk among Chinese students aged 6-14 years.
我们采用机器学习方法探索一项针对学生的大型调查数据,旨在识别并验证一组与下呼吸道感染(LRTI)相关的重要节俭因素。
于2022年1月对在北京就读小学或初中的6至14岁学生进行横断面整群抽样调查。通过电子问卷收集数据。使用PyCharm(2018.1 x64版)和Python(3.7.6版)完成统计分析。
分析了11308名学生(5527名女生和5781名男生)的数据,其中909人患有LRTI,患病率为8.01%。经过综合评估,高斯朴素贝叶斯(gNB)算法优于其他机器学习算法。gNB算法的准确率为0.856,精确率为0.140,召回率为0.165,F1分数为0.151,受试者工作特征曲线下面积(AUROC)为0.652。使用最优gNB算法,年龄、鼻炎、久坐时间、龋齿以及食物或药物过敏这五个重要因素具有良好的预测性能。此外,这五个因素的预测性能与所有建模因素相当。例如,在顺序深度学习模型下,纳入五个因素时准确率和损失分别为92.26%和25.62%,纳入所有因素时分别为92.22%和25.52%。
我们的研究结果表明,gNB算法建模的五个重要因素能够充分代表所有涉及因素,用于预测6至14岁中国学生的LRTI风险。