Santos Claudia Yang, Tuboi Suely, de Jesus Lopes de Abreu Ariane, Abud Denise Alves, Lobao Neto Abner Augusto, Pereira Ramon, Siqueira Joao Bosco
Takeda Pharmaceuticals Brazil, Av. das Nações Unidas 14401, São Paulo, SP, Brazil.
IQVIA Brazil, Rua Verbo Divino 2001, São Paulo, SP, Brazil.
Heliyon. 2023 May 30;9(6):e16634. doi: 10.1016/j.heliyon.2023.e16634. eCollection 2023 Jun.
Dengue, like other arboviruses with broad clinical spectra, can easily be misdiagnosed as other infectious diseases due to the overlap of signs and symptoms. During large outbreaks, severe dengue cases have the potential to overwhelm the health care system and understanding the burden of dengue hospitalizations is therefore important to better allocate medical care and public health resources. A machine learning model that used data from the Brazilian public healthcare system database and the National Institute of Meteorology (INMET) was developed to estimate potential misdiagnosed dengue hospitalizations in Brazil. The data was modeled into a hospitalization level linked dataset. Then, Random Forest, Logistic Regression and Support Vector Machine algorithms were assessed. The algorithms were trained by dividing the dataset in training/test set and performing a cross validation to select the best hyperparameters in each algorithm tested. The evaluation was done based on accuracy, precision, recall, F1 score, sensitivity, and specificity. The best model developed was Random Forest with an accuracy of 85% on the final reviewed test. This model shows that 3.4% (13,608) of all hospitalizations in the public healthcare system from 2014 to 2020 could have been dengue misdiagnosed as other diseases. The model was helpful in finding potentially misdiagnosed dengue and might be a useful tool to help public health decision makers in planning resource allocation.
登革热与其他具有广泛临床谱的虫媒病毒一样,由于体征和症状的重叠,很容易被误诊为其他传染病。在大规模疫情期间,严重登革热病例有可能压垮医疗系统,因此了解登革热住院负担对于更好地分配医疗护理和公共卫生资源很重要。开发了一种机器学习模型,该模型使用来自巴西公共医疗系统数据库和国家气象研究所(INMET)的数据,以估计巴西可能被误诊的登革热住院病例。这些数据被建模为一个与住院水平相关的数据集。然后,对随机森林、逻辑回归和支持向量机算法进行了评估。通过将数据集划分为训练/测试集并进行交叉验证来训练算法,以选择每个测试算法中的最佳超参数。评估基于准确率、精确率、召回率、F1分数、灵敏度和特异性进行。开发的最佳模型是随机森林,在最终审核测试中的准确率为85%。该模型显示,2014年至2020年公共医疗系统中所有住院病例的3.4%(13608例)可能是被误诊为其他疾病的登革热病例。该模型有助于发现可能被误诊的登革热病例,可能是帮助公共卫生决策者规划资源分配的有用工具。