Keshavamurthy Ravikiran, Dixon Samuel, Pazdernik Karl T, Charles Lauren E
Pacific Northwest National Laboratory, Richland, WA 99354, USA.
Paul G. Allen School for Global Health, Washington State University, Pullman, WA 99164, USA.
One Health. 2022 Oct 1;15:100439. doi: 10.1016/j.onehlt.2022.100439. eCollection 2022 Dec.
The complex, unpredictable nature of pathogen occurrence has required substantial efforts to accurately predict infectious diseases (IDs). With rising popularity of Machine Learning (ML) and Deep Learning (DL) techniques combined with their unique ability to uncover connections between large amounts of diverse data, we conducted a PRISMA systematic review to investigate advances in ID prediction for human and animal diseases using ML and DL. This review included the type of IDs modeled, ML and DL techniques utilized, geographical distribution, prediction tasks performed, input features utilized, spatial and temporal scales, error metrics used, computational efficiency, uncertainty quantification, and missing data handling methods. Among 237 relevant articles published between January 2001 and May 2021, highly contagious diseases in humans were most often represented, including COVID-19 (37.1%), influenza/influenza-like illnesses (9.3%), dengue (8.9%), and malaria (5.1%). Out of 37 diseases identified, 51.4% were zoonotic, 37.8% were human-only, and 8.1% were animal-only, with only 1.6% economically significant, non-zoonotic livestock diseases. Despite the number of zoonoses, 86.5% of articles modeled humans whereas only a few articles (5.1%) contained more than one host species. Eastern Asia (32.5%), North America (17.7%), and Southern Asia (13.1%) were the most represented locations. Frequent approaches included tree-based ML (38.4%) and feed-forward neural networks (26.6%). Articles predicted temporal incidence (66.7%), disease risk (38.0%), and/or spatial movement (31.2%). Less than 10% of studies addressed uncertainty quantification, computational efficiency, and missing data, which are essential to operational use and deployment. This study highlights trends and gaps in ML and DL for ID prediction, providing guidelines for future works to better support biopreparedness and response. To fully utilize ML and DL for improved ID forecasting, models should include the full disease ecology in a One-Health context, important food and agricultural diseases, underrepresented hotspots, and important metrics required for operational deployment.
病原体出现的复杂性和不可预测性使得人们付出了巨大努力来准确预测传染病(ID)。随着机器学习(ML)和深度学习(DL)技术的日益普及,以及它们揭示大量不同数据之间联系的独特能力,我们进行了一项PRISMA系统综述,以研究使用ML和DL对人类和动物疾病进行ID预测的进展。该综述涵盖了所建模的ID类型、所使用的ML和DL技术、地理分布、执行的预测任务、所使用的输入特征、时空尺度、误差度量、计算效率、不确定性量化以及缺失数据处理方法。在2001年1月至2021年5月发表的237篇相关文章中,人类的高传染性疾病最为常见,包括新冠肺炎(37.1%)、流感/流感样疾病(9.3%)、登革热(8.9%)和疟疾(5.1%)。在确定的37种疾病中,51.4%是人畜共患病,37.8%仅感染人类,8.1%仅感染动物,只有1.6%是具有经济重要性的非人畜共患家畜疾病。尽管人畜共患病数量众多,但86.5%的文章以人类为模型,而只有少数文章(5.1%)包含不止一种宿主物种。东亚(32.5%)、北美(17.7%)和南亚(13.1%)是文章中最常出现的地区。常见的方法包括基于树的ML(38.4%)和前馈神经网络(26.6%)。文章预测了时间发病率(66.7%)、疾病风险(38.0%)和/或空间移动(31.2%)。不到10%的研究涉及不确定性量化、计算效率和缺失数据,而这些对于实际应用和部署至关重要。本研究突出了ML和DL在ID预测方面的趋势和差距,为未来的工作提供了指导方针,以更好地支持生物防范和应对。为了充分利用ML和DL改进ID预测,模型应在“同一健康”背景下纳入完整的疾病生态学、重要的粮食和农业疾病、未得到充分研究的热点地区以及实际部署所需的重要指标。