Sumon Md Shaheenur Islam, Hossain Md Sakib Abrar, Al-Sulaiti Haya, Yassine Hadi M, Chowdhury Muhammad E H
Department of Electrical Engineering, Qatar University, Doha P.O. Box 2713, Qatar.
Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada.
Metabolites. 2025 Jan 11;15(1):44. doi: 10.3390/metabo15010044.
Respiratory viruses, including Influenza, RSV, and COVID-19, cause various respiratory infections. Distinguishing these viruses relies on diagnostic methods such as PCR testing. Challenges stem from overlapping symptoms and the emergence of new strains. Advanced diagnostics are crucial for accurate detection and effective management. This study leveraged nasopharyngeal metabolome data to predict respiratory virus scenarios including control vs. RSV, control vs. Influenza A, control vs. COVID-19, control vs. all respiratory viruses, and COVID-19 vs. Influenza A/RSV. We proposed a stacking-based ensemble technique, integrating the top three best-performing ML models from the initial results to enhance prediction accuracy by leveraging the strengths of multiple base learners. Key techniques such as feature ranking, standard scaling, and SMOTE were used to address class imbalances, thus enhancing model robustness. SHAP analysis identified crucial metabolites influencing positive predictions, thereby providing valuable insights into diagnostic markers. Our approach not only outperformed existing methods but also revealed top dominant features for predicting COVID-19, including Lysophosphatidylcholine acyl C18:2, Kynurenine, Phenylalanine, Valine, Tyrosine, and Aspartic Acid (Asp). This study demonstrates the effectiveness of leveraging nasopharyngeal metabolome data and stacking-based ensemble techniques for predicting respiratory virus scenarios. The proposed approach enhances prediction accuracy, provides insights into key diagnostic markers, and offers a robust framework for managing respiratory infections.
包括流感、呼吸道合胞病毒(RSV)和新冠病毒(COVID-19)在内的呼吸道病毒会引发各种呼吸道感染。区分这些病毒依赖于聚合酶链反应(PCR)检测等诊断方法。挑战源于症状重叠以及新毒株的出现。先进的诊断方法对于准确检测和有效管理至关重要。本研究利用鼻咽代谢组数据来预测呼吸道病毒感染情况,包括对照组与RSV、对照组与甲型流感、对照组与COVID-19、对照组与所有呼吸道病毒,以及COVID-19与甲型流感/RSV。我们提出了一种基于堆叠的集成技术,将初始结果中表现最佳的三个机器学习模型整合起来,通过利用多个基础学习器的优势来提高预测准确性。使用特征排序、标准缩放和合成少数过采样技术(SMOTE)等关键技术来解决类别不平衡问题,从而增强模型的稳健性。SHAP分析确定了影响阳性预测的关键代谢物,从而为诊断标志物提供了有价值的见解。我们的方法不仅优于现有方法,还揭示了预测COVID-19的主要特征,包括溶血磷脂酰胆碱酰基C18:2、犬尿氨酸、苯丙氨酸、缬氨酸、酪氨酸和天冬氨酸(Asp)。本研究证明了利用鼻咽代谢组数据和基于堆叠的集成技术预测呼吸道病毒感染情况的有效性。所提出的方法提高了预测准确性,提供了关键诊断标志物的见解,并为管理呼吸道感染提供了一个稳健的框架。