Fiandrino Stefania, Donà Daniele, Giaquinto Carlo, Poletti Piero, Tira Michael Davis, Di Chiara Costanza, Paolotti Daniela
University of Rome La Sapienza, Rome, Italy.
ISI Foundation, Torino, Italy.
BMJ Public Health. 2025 Jun 3;3(1):e001888. doi: 10.1136/bmjph-2024-001888. eCollection 2025.
The epidemiology and clinical characteristics of COVID-19 evolved due to new SARS-CoV-2 variants of concern (VOCs). The Omicron VOC's higher transmissibility increased paediatric COVID-19 cases and hospital admissions. Most research during the Omicron period has focused on hospitalised cases, leaving a gap in understanding the disease's evolution in community settings. This study targets children with mild to moderate COVID-19 during pre-Omicron and Omicron periods. It aims to identify patterns in COVID-19 morbidity by clustering individuals based on symptom similarities and duration of symptoms and develop a machine-learning tool to classify new cases into risk groups.
We propose a data-driven approach to explore changes in COVID-19 characteristics by analysing data from 581 children and adolescents collected within a paediatric cohort at the University Hospital of Padua. First, we apply an unsupervised machine-learning algorithm to cluster individuals into groups. Second, we classify new patient risk groups using a random forest classifier model based on sociodemographic information, pre-existing medical conditions, vaccination status and the VOC as predictive variables. Third, we explore the key features influencing the classification through the SHapley Additive exPlanations.
The unsupervised clustering identified three severity risk profile groups. Cluster 0 (mildest) had an average of 1.2 symptoms (95% CI 0.0 to 5.0) and mean symptom duration of 1.26 days (95%CI 0.0 to 9.0), cluster 1 had 2.27 symptoms (95% CI 1.0 to 6.0) lasting 3.47 days (95% CI 1.0 to 12.0), while cluster 2 (strongest symptom expression) exhibited 3.41 symptoms (95% CI 2.0 to 7.0) over 5.52 days (95% CI 0.0 to 16.0). Feature importance analysis showed that age was the most important predictor, followed by the variant of infection, influenza vaccination and the presence of comorbidities. The analysis revealed that younger children, unvaccinated individuals, those infected with Omicron and those with comorbidities were at higher risk of experiencing a greater number and longer duration of symptoms.
Our classification model has the potential to provide clinicians with insights into the children's risk profile of COVID-19 using readily available data. This approach can support public health by clarifying disease burden and improving patient care strategies. Furthermore, it underscores the importance of integrating risk classification models to monitor and manage infectious diseases.
新型严重急性呼吸综合征冠状病毒2(SARS-CoV-2)变异株的出现改变了新冠病毒疾病(COVID-19)的流行病学特征和临床特点。奥密克戎变异株的高传播性导致儿童COVID-19病例数和住院人数增加。奥密克戎流行期间的大多数研究都集中在住院病例上,对该疾病在社区环境中的演变了解不足。本研究针对奥密克戎变异株流行前和流行期间患有轻至中度COVID-19的儿童。旨在通过根据症状相似性和症状持续时间对个体进行聚类,识别COVID-19发病模式,并开发一种机器学习工具,将新病例分类为风险组。
我们提出一种数据驱动的方法,通过分析帕多瓦大学医院儿科队列中收集的581名儿童和青少年的数据,探索COVID-19特征的变化。首先,我们应用无监督机器学习算法将个体聚类为不同组。其次,我们使用基于社会人口统计学信息、既往病史、疫苗接种状况和变异株作为预测变量的随机森林分类器模型,对新患者的风险组进行分类。第三,我们通过SHapley值附加解释法(SHapley Additive exPlanations)探索影响分类的关键特征。
无监督聚类识别出三个严重程度风险概况组。第0组(症状最轻)平均有1.2种症状(95%置信区间为0.0至5.0),平均症状持续时间为1.26天(95%置信区间为0.0至9.0);第1组有2.27种症状(95%置信区间为1.0至6.0),持续3.47天(置信区间为1.0至12.0);而第2组(症状表现最强)在5.52天内出现3.41种症状(95%置信区间为2.0至7.0)(95%置信区间为0.0至16.0)。特征重要性分析表明,年龄是最重要的预测因素,其次是感染变异株、流感疫苗接种情况和合并症的存在。分析显示,年龄较小的儿童、未接种疫苗的个体、感染奥密克戎变异株的个体以及患有合并症的个体,出现症状数量更多、持续时间更长的风险更高。
我们的分类模型有可能利用现有的数据,为临床医生提供有关儿童COVID-19风险概况的见解。这种方法可以通过明确疾病负担和改进患者护理策略来支持公共卫生。此外,它强调了整合风险分类模型以监测和管理传染病的重要性。