• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

混合机器学习方法可改进登革热预测中零膨胀数据的准确性。

Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction.

机构信息

Center for Marine Environmental Studies (CMES), Ehime University, Matsuyama, Japan.

Graduate School of Science and Engineering, Ehime University, Matsuyama, Ehime, Japan.

出版信息

PLoS Negl Trop Dis. 2024 Oct 21;18(10):e0012599. doi: 10.1371/journal.pntd.0012599. eCollection 2024 Oct.

DOI:10.1371/journal.pntd.0012599
PMID:39432557
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11527386/
Abstract

BACKGROUND

Spatiotemporal dengue forecasting using machine learning (ML) can contribute to the development of prevention and control strategies for impending dengue outbreaks. However, training data for dengue incidence may be inflated with frequent zero values because of the rarity of cases, which lowers the prediction accuracy. This study aimed to understand the influence of spatiotemporal resolutions of data on the accuracy of dengue incidence prediction using ML models, to understand how the influence of spatiotemporal resolution differs between quantitative and qualitative predictions of dengue incidence, and to improve the accuracy of dengue incidence prediction with zero-inflated data.

METHODOLOGY

We predicted dengue incidence at six spatiotemporal resolutions and compared their prediction accuracy. Six ML algorithms were compared: generalized additive models, random forests, conditional inference forest, artificial neural networks, support vector machines and regression, and extreme gradient boosting. Data from 2009 to 2012 were used for training, and data from 2013 were used for model validation with quantitative and qualitative dengue variables. To address the inaccuracy in the quantitative prediction of dengue incidence due to zero-inflated data at fine spatiotemporal scales, we developed a hybrid approach in which the second-stage quantitative prediction is performed only when/where the first-stage qualitative model predicts the occurrence of dengue cases.

PRINCIPAL FINDINGS

At higher resolutions, the dengue incidence data were zero-inflated, which was insufficient for quantitative pattern extraction of relationships between dengue incidence and environmental variables by ML. Qualitative models, used as binary variables, eased the effect of data distribution. Our novel hybrid approach of combining qualitative and quantitative predictions demonstrated high potential for predicting zero-inflated or rare phenomena, such as dengue.

SIGNIFICANCE

Our research contributes valuable insights to the field of spatiotemporal dengue prediction and provides an alternative solution to enhance prediction accuracy in zero-inflated data where hurdle or zero-inflated models cannot be applied.

摘要

背景

使用机器学习(ML)进行时空登革热预测有助于制定预防和控制即将发生的登革热爆发的策略。然而,由于病例罕见,登革热发病率的训练数据可能会因频繁出现零值而膨胀,从而降低预测准确性。本研究旨在了解数据的时空分辨率对使用 ML 模型预测登革热发病率的准确性的影响,了解时空分辨率对登革热发病率的定量和定性预测的影响有何不同,以及如何利用零膨胀数据提高登革热发病率预测的准确性。

方法

我们预测了六个时空分辨率的登革热发病率,并比较了它们的预测准确性。比较了六种 ML 算法:广义加性模型、随机森林、条件推断森林、人工神经网络、支持向量机和回归以及极端梯度提升。使用 2009 年至 2012 年的数据进行训练,并使用 2013 年的数据对定量和定性登革热变量进行模型验证。为了解决由于精细时空尺度上零膨胀数据导致的登革热发病率定量预测不准确的问题,我们开发了一种混合方法,其中仅在第一阶段定性模型预测登革热病例发生时/在该位置进行第二阶段定量预测。

主要发现

在较高的分辨率下,登革热发病率数据是零膨胀的,这不足以通过 ML 提取登革热发病率与环境变量之间的定量关系模式。作为二进制变量使用的定性模型缓解了数据分布的影响。我们结合定性和定量预测的新颖混合方法显示出了预测零膨胀或罕见现象(如登革热)的巨大潜力。

意义

我们的研究为时空登革热预测领域提供了有价值的见解,并为在无法应用障碍或零膨胀模型的零膨胀数据中提高预测准确性提供了替代解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/470a/11527386/5e664ce3626b/pntd.0012599.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/470a/11527386/f535693a91cc/pntd.0012599.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/470a/11527386/f8d9857cee25/pntd.0012599.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/470a/11527386/5e664ce3626b/pntd.0012599.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/470a/11527386/f535693a91cc/pntd.0012599.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/470a/11527386/f8d9857cee25/pntd.0012599.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/470a/11527386/5e664ce3626b/pntd.0012599.g003.jpg

相似文献

1
Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction.混合机器学习方法可改进登革热预测中零膨胀数据的准确性。
PLoS Negl Trop Dis. 2024 Oct 21;18(10):e0012599. doi: 10.1371/journal.pntd.0012599. eCollection 2024 Oct.
2
Risk prediction system for dengue transmission based on high resolution weather data.基于高分辨率气象数据的登革热传播风险预测系统。
PLoS One. 2018 Dec 6;13(12):e0208203. doi: 10.1371/journal.pone.0208203. eCollection 2018.
3
Prediction of dengue outbreak in Selangor Malaysia using machine learning techniques.利用机器学习技术预测马来西亚雪兰莪州登革热疫情
Sci Rep. 2021 Jan 13;11(1):939. doi: 10.1038/s41598-020-79193-2.
4
Developing a dengue forecast model using machine learning: A case study in China.利用机器学习开发登革热预测模型:以中国为例的案例研究。
PLoS Negl Trop Dis. 2017 Oct 16;11(10):e0005973. doi: 10.1371/journal.pntd.0005973. eCollection 2017 Oct.
5
Probabilistic seasonal dengue forecasting in Vietnam: A modelling study using superensembles.越南基于超级集成的概率季节性登革热预测:建模研究。
PLoS Med. 2021 Mar 4;18(3):e1003542. doi: 10.1371/journal.pmed.1003542. eCollection 2021 Mar.
6
Spatiotemporal analysis of historical records (2001-2012) on dengue fever in Vietnam and development of a statistical model for forecasting risk.越南登革热历史记录(2001-2012 年)的时空分析及风险预测统计模型的建立
PLoS One. 2019 Nov 27;14(11):e0224353. doi: 10.1371/journal.pone.0224353. eCollection 2019.
7
Assessing the risk of dengue severity using demographic information and laboratory test results with machine learning.使用人口统计学信息和实验室检测结果,结合机器学习评估登革热严重程度的风险。
PLoS Negl Trop Dis. 2020 Dec 23;14(12):e0008960. doi: 10.1371/journal.pntd.0008960. eCollection 2020 Dec.
8
Analysis of significant factors for dengue fever incidence prediction.登革热发病率预测的重要因素分析。
BMC Bioinformatics. 2016 Apr 16;17:166. doi: 10.1186/s12859-016-1034-5.
9
A reproducible ensemble machine learning approach to forecast dengue outbreaks.一种可重现的集成机器学习方法,用于预测登革热疫情。
Sci Rep. 2024 Feb 15;14(1):3807. doi: 10.1038/s41598-024-52796-9.
10
Machine-Learning-Based Forecasting of Dengue Fever in Brazilian Cities Using Epidemiologic and Meteorological Variables.基于机器学习的巴西城市登革热预测:利用流行病学和气象变量。
Am J Epidemiol. 2022 Sep 28;191(10):1803-1812. doi: 10.1093/aje/kwac090.

本文引用的文献

1
Must-have Qualities of Clinical Research on Artificial Intelligence and Machine Learning.人工智能和机器学习临床研究的必备素质
Balkan Med J. 2023 Jan 23;40(1):3-12. doi: 10.4274/balkanmedj.galenos.2022.2022-11-51. Epub 2022 Dec 29.
2
Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques.使用计算智能技术处理类不平衡临床数据集上的二元分类问题。
Healthcare (Basel). 2022 Jul 13;10(7):1293. doi: 10.3390/healthcare10071293.
3
An artificial intelligence-based risk prediction model of myocardial infarction.
基于人工智能的心肌梗死风险预测模型。
BMC Bioinformatics. 2022 Jun 7;23(1):217. doi: 10.1186/s12859-022-04761-4.
4
An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis.基于机器学习模型的不平衡数据乳腺癌预测的有效上采样方法:比较分析。
PLoS One. 2022 May 27;17(5):e0269135. doi: 10.1371/journal.pone.0269135. eCollection 2022.
5
Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data.预测住院时长:一种用于处理高度偏态数据的两阶段建模方法。
BMC Med Inform Decis Mak. 2022 Apr 24;22(1):110. doi: 10.1186/s12911-022-01855-0.
6
Forecasting Dengue Hotspots Associated With Variation in Meteorological Parameters Using Regression and Time Series Models.利用回归和时间序列模型预测与气象参数变化相关的登革热热点。
Front Public Health. 2021 Nov 26;9:798034. doi: 10.3389/fpubh.2021.798034. eCollection 2021.
7
A comparison of zero-inflated and hurdle models for modeling zero-inflated count data.用于对零膨胀计数数据进行建模的零膨胀模型和障碍模型的比较。
J Stat Distrib Appl. 2021;8(1):8. doi: 10.1186/s40488-021-00121-4. Epub 2021 Jun 24.
8
Dengue disease dynamics are modulated by the combined influences of precipitation and landscape: A machine learning approach.登革热疾病动态受降水和景观综合影响的调节:一种机器学习方法。
Sci Total Environ. 2021 Oct 20;792:148406. doi: 10.1016/j.scitotenv.2021.148406. Epub 2021 Jun 10.
9
A dynamic, ensemble learning approach to forecast dengue fever epidemic years in Brazil using weather and population susceptibility cycles.利用天气和人口易感性周期对巴西登革热流行年份进行动态、集成学习预测的方法。
J R Soc Interface. 2021 Jun;18(179):20201006. doi: 10.1098/rsif.2020.1006. Epub 2021 Jun 16.
10
Using heterogeneous data to identify signatures of dengue outbreaks at fine spatio-temporal scales across Brazil.利用异质数据在巴西各地精细时空尺度上识别登革热疫情特征。
PLoS Negl Trop Dis. 2021 May 21;15(5):e0009392. doi: 10.1371/journal.pntd.0009392. eCollection 2021 May.