Qaiser Ariba, Manzoor Sobia, Hashmi Asraf Hussain, Javed Hasnain, Zafar Anam, Ashraf Javed
Molecular Virology Lab, National University of Science and Technology (NUST), Atta-ur-Rehman School of Applied Biosciences (ASAB), Islamabad, Pakistan.
Institute of Biomedical and Genetic Engineering (IBGE), KRL Hospital, Islamabad, Pakistan.
Adv Virol. 2024 Oct 14;2024:5588127. doi: 10.1155/2024/5588127. eCollection 2024.
There is a dire need for the establishment of active dengue surveillance to continuously detect cases, circulating serotypes, and determine the disease burden of dengue fever (DF) in the country and region. Predicting dengue PCR results using machine learning (ML) models represents a significant advancement in pre-emptive healthcare measures. This study outlines the comprehensive process of data preprocessing, model selection, and the underlying mechanisms of each algorithm employed to accurately predict dengue PCR outcomes. We analyzed data from 300 suspected dengue patients in Islamabad and Rawalpindi, Pakistan, from August to October 2023. NS1 antigen ELISA, IgM and IgG antibody tests, and serotype-specific real-time polymerase chain reaction (RT-PCR) were used to detect the dengue virus (DENV). Representative PCR-positive samples were sequenced by Sanger sequencing to confirm the circulation of various dengue serotypes. Demographic information, serological test results, and hematological parameters were used as inputs to the ML models, with the dengue PCR result serving as the output to be predicted. The models used were logistic regression, XGBoost, LightGBM, random forest, support vector machine (SVM), and CatBoost. Of the 300 patients, 184 (61.33%) were PCR positive. Among the total positive cases detected by PCR, 9 (4.89%), 171 (92.93%), and 4 (2.17%) were infected with serotypes 1, 2, and 3, respectively. A total of 147 (79.89%) males and 37 (20.11%) females were infected, with a mean age of 33 ± 16 years. In addition, the mean platelet and leukocyte counts and the hematocrit percentages were 75,447%, 4189.02%, and 46.05%, respectively. The SVM was the best-performing ML model for predicting RT-PCR results, with 71.4% accuracy, 97.4% recall, and 71.6% precision. Hyperparameter tuning improved the recall to 100%. Our study documents three circulating serotypes in the capital territory of Pakistan and highlights that the SVM outperformed other models, potentially serving as a valuable tool in clinical settings to aid in the rapid diagnosis of DF.
迫切需要建立积极的登革热监测体系,以持续检测病例、流行血清型,并确定该国和该地区登革热(DF)的疾病负担。使用机器学习(ML)模型预测登革热PCR结果是预防性医疗措施的一项重大进展。本研究概述了数据预处理、模型选择的全面过程,以及为准确预测登革热PCR结果而采用的每种算法的潜在机制。我们分析了2023年8月至10月期间巴基斯坦伊斯兰堡和拉瓦尔品第300名疑似登革热患者的数据。采用NS1抗原酶联免疫吸附测定(ELISA)、IgM和IgG抗体检测以及血清型特异性实时聚合酶链反应(RT-PCR)检测登革热病毒(DENV)。对代表性的PCR阳性样本进行桑格测序,以确认各种登革热血清型的流行情况。人口统计学信息、血清学检测结果和血液学参数用作ML模型的输入,登革热PCR结果用作待预测的输出。所使用的模型有逻辑回归、XGBoost、LightGBM、随机森林、支持向量机(SVM)和CatBoost。在300名患者中,184例(61.33%)PCR呈阳性。在PCR检测出的所有阳性病例中,分别有9例(4.89%)、171例(92.93%)和4例(2.17%)感染了血清型1、2和3。共有147例(79.89%)男性和37例(20.11%)女性感染,平均年龄为33±16岁。此外,血小板和白细胞计数的平均值以及血细胞比容百分比分别为75447%、4189.02%和46.05%。SVM是预测RT-PCR结果表现最佳的ML模型,准确率为71.4%,召回率为97.4%,精确率为71.6%。超参数调整将召回率提高到了100%。我们的研究记录了巴基斯坦首都地区三种流行的血清型,并强调SVM的表现优于其他模型,有可能成为临床环境中辅助快速诊断登革热的宝贵工具。