• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

预测美国的麻疹疫情:机器学习方法评估

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches.

作者信息

Ru Boshu, Kujawski Stephanie, Lee Afanador Nelson, Baumgartner Richard, Pawaskar Manjiri, Das Amar

机构信息

Merck & Co, Inc, West Point, PA, United States.

Merck & Co, Inc, Rahway, NJ, United States.

出版信息

JMIR Form Res. 2023 Apr 4;7:e42832. doi: 10.2196/42832.

DOI:10.2196/42832
PMID:37014694
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10131820/
Abstract

BACKGROUND

Measles, a highly contagious viral infection, is resurging in the United States, driven by international importation and declining domestic vaccination coverage. Despite this resurgence, measles outbreaks are still rare events that are difficult to predict. Improved methods to predict outbreaks at the county level would facilitate the optimal allocation of public health resources.

OBJECTIVE

We aimed to validate and compare extreme gradient boosting (XGBoost) and logistic regression, 2 supervised learning approaches, to predict the US counties most likely to experience measles cases. We also aimed to assess the performance of hybrid versions of these models that incorporated additional predictors generated by 2 clustering algorithms, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) and unsupervised random forest (uRF).

METHODS

We constructed a supervised machine learning model based on XGBoost and unsupervised models based on HDBSCAN and uRF. The unsupervised models were used to investigate clustering patterns among counties with measles outbreaks; these clustering data were also incorporated into hybrid XGBoost models as additional input variables. The machine learning models were then compared to logistic regression models with and without input from the unsupervised models.

RESULTS

Both HDBSCAN and uRF identified clusters that included a high percentage of counties with measles outbreaks. XGBoost and XGBoost hybrid models outperformed logistic regression and logistic regression hybrid models, with the area under the receiver operating curve values of 0.920-0.926 versus 0.900-0.908, the area under the precision-recall curve values of 0.522-0.532 versus 0.485-0.513, and F scores of 0.595-0.601 versus 0.385-0.426. Logistic regression or logistic regression hybrid models had higher sensitivity than XGBoost or XGBoost hybrid models (0.837-0.857 vs 0.704-0.735) but a lower positive predictive value (0.122-0.141 vs 0.340-0.367) and specificity (0.793-0.821 vs 0.952-0.958). The hybrid versions of the logistic regression and XGBoost models had slightly higher areas under the precision-recall curve, specificity, and positive predictive values than the respective models that did not include any unsupervised features.

CONCLUSIONS

XGBoost provided more accurate predictions of measles cases at the county level compared with logistic regression. The threshold of prediction in this model can be adjusted to align with each county's resources, priorities, and risk for measles. While clustering pattern data from unsupervised machine learning approaches improved some aspects of model performance in this imbalanced data set, the optimal approach for the integration of such approaches with supervised machine learning models requires further investigation.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4220/10131820/c6a571192fc4/formative_v7i1e42832_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4220/10131820/ea2733756e5a/formative_v7i1e42832_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4220/10131820/0134cb65e178/formative_v7i1e42832_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4220/10131820/c6a571192fc4/formative_v7i1e42832_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4220/10131820/ea2733756e5a/formative_v7i1e42832_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4220/10131820/0134cb65e178/formative_v7i1e42832_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4220/10131820/c6a571192fc4/formative_v7i1e42832_fig3.jpg
摘要

背景

麻疹是一种具有高度传染性的病毒感染疾病,在美国因国际输入和国内疫苗接种覆盖率下降而再度流行。尽管出现了这种复苏情况,但麻疹疫情仍然是难以预测的罕见事件。改进县级疫情预测方法将有助于优化公共卫生资源的分配。

目的

我们旨在验证和比较极端梯度提升(XGBoost)和逻辑回归这两种监督学习方法,以预测美国最有可能出现麻疹病例的县。我们还旨在评估这些模型的混合版本的性能,这些混合版本纳入了由两种聚类算法(基于密度的具有噪声的分层空间聚类(HDBSCAN)和无监督随机森林(uRF))生成的额外预测变量。

方法

我们构建了基于XGBoost的监督机器学习模型以及基于HDBSCAN和uRF的无监督模型。无监督模型用于研究麻疹疫情县之间的聚类模式;这些聚类数据也作为额外的输入变量纳入到混合XGBoost模型中。然后将机器学习模型与有无无监督模型输入的逻辑回归模型进行比较。

结果

HDBSCAN和uRF都识别出了包含高比例麻疹疫情县的聚类。XGBoost和XGBoost混合模型的表现优于逻辑回归和逻辑回归混合模型,其受试者工作特征曲线下面积值为0.920 - 0.926,而逻辑回归和逻辑回归混合模型为0.900 - 0.908;精确召回率曲线下面积值为0.522 - 0.532,而逻辑回归和逻辑回归混合模型为0.485 - 0.513;F分数为0.595 - 0.601,而逻辑回归和逻辑回归混合模型为0.385 - 0.426。逻辑回归或逻辑回归混合模型的敏感性高于XGBoost或XGBoost混合模型(0.837 - 0.857对0.704 - 0.735),但阳性预测值较低(0.122 - 0.141对0.340 - 0.367),特异性也较低(0.793 - 0.821对0.952 - 0.958)。逻辑回归和XGBoost模型的混合版本在精确召回率曲线下面积、特异性和阳性预测值方面比不包括任何无监督特征的相应模型略高。

结论

与逻辑回归相比,XGBoost在县级层面提供了更准确的麻疹病例预测。该模型中的预测阈值可根据每个县的资源、优先事项和麻疹风险进行调整。虽然来自无监督机器学习方法的聚类模式数据在这个不平衡数据集中改善了模型性能的某些方面,但将这些方法与监督机器学习模型进行整合的最佳方法仍需进一步研究。

相似文献

1
Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches.预测美国的麻疹疫情:机器学习方法评估
JMIR Form Res. 2023 Apr 4;7:e42832. doi: 10.2196/42832.
2
Machine Learning Approaches for Stroke Risk Prediction: Findings from the Suita Study.用于中风风险预测的机器学习方法:吹田研究的结果
J Cardiovasc Dev Dis. 2024 Jul 1;11(7):207. doi: 10.3390/jcdd11070207.
3
[Comparison of machine learning and Logistic regression model in predicting acute kidney injury after cardiac surgery: data analysis based on MIMIC-III database].[机器学习与逻辑回归模型在预测心脏手术后急性肾损伤中的比较:基于MIMIC-III数据库的数据分析]
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2022 Nov;34(11):1188-1193. doi: 10.3760/cma.j.cn121430-20210223-00279.
4
Prediction of measles cases in US counties: A machine learning approach.美国各县麻疹病例预测:一种机器学习方法。
Vaccine. 2024 Dec 2;42(26):126289. doi: 10.1016/j.vaccine.2024.126289. Epub 2024 Sep 7.
5
Prediction of postoperative infectious complications in elderly patients with colorectal cancer: a study based on improved machine learning.基于改进机器学习的老年结直肠癌患者术后感染并发症预测研究。
BMC Med Inform Decis Mak. 2024 Jan 6;24(1):11. doi: 10.1186/s12911-023-02411-0.
6
Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病:模型开发与性能评估
JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.
7
Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment.机器学习方法预测抗血栓治疗患者胃肠道出血的效果比较。
JAMA Netw Open. 2021 May 3;4(5):e2110703. doi: 10.1001/jamanetworkopen.2021.10703.
8
Predicting 30-day mortality in severely injured elderly patients with trauma in Korea using machine learning algorithms: a retrospective study.使用机器学习算法预测韩国严重创伤老年患者的30天死亡率:一项回顾性研究。
J Trauma Inj. 2024 Sep;37(3):201-208. doi: 10.20408/jti.2024.0024. Epub 2024 Aug 8.
9
Prediction of poststroke independent walking using machine learning: a retrospective study.基于机器学习的脑卒中后独立行走预测:一项回顾性研究。
BMC Neurol. 2024 Sep 10;24(1):332. doi: 10.1186/s12883-024-03849-z.
10
Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation.可解释机器学习技术预测胺碘酮诱导甲状腺功能障碍风险:多中心回顾性研究及外部验证。
J Med Internet Res. 2023 Feb 7;25:e43734. doi: 10.2196/43734.

本文引用的文献

1
Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups.深度意义聚类:一种用于识别风险分层和预测患者亚组的新方法。
J Am Med Inform Assoc. 2021 Nov 25;28(12):2641-2653. doi: 10.1093/jamia/ocab203.
2
COVID-19 and vaccine hesitancy: A longitudinal study.新冠病毒肺炎与疫苗犹豫:一项纵向研究。
PLoS One. 2021 Apr 16;16(4):e0250123. doi: 10.1371/journal.pone.0250123. eCollection 2021.
3
Air Passenger Travel and International Surveillance Data Predict Spatiotemporal Variation in Measles Importations to the United States.
航空旅客旅行和国际监测数据预测美国麻疹输入的时空变化。
Pathogens. 2021 Feb 3;10(2):155. doi: 10.3390/pathogens10020155.
4
Persistence of US measles risk due to vaccine hesitancy and outbreaks abroad.由于疫苗犹豫和国外疫情爆发,美国麻疹风险持续存在。
Lancet Infect Dis. 2020 Oct;20(10):1114-1115. doi: 10.1016/S1473-3099(20)30522-3. Epub 2020 Jul 30.
5
Resurgence of measles in the United States: how did we get here?美国麻疹疫情卷土重来:我们是如何走到这一步的?
Curr Opin Pediatr. 2020 Feb;32(1):139-144. doi: 10.1097/MOP.0000000000000845.
6
Vaccination Coverage by Age 24 Months Among Children Born in 2015 and 2016 - National Immunization Survey-Child, United States, 2016-2018.2015 年和 2016 年出生的儿童在 24 个月龄时的疫苗接种覆盖率——美国,2016-2018 年全国免疫调查-儿童。
MMWR Morb Mortal Wkly Rep. 2019 Oct 18;68(41):913-918. doi: 10.15585/mmwr.mm6841e2.
7
National Update on Measles Cases and Outbreaks - United States, January 1-October 1, 2019.2019 年 1 月 1 日至 10 月 1 日美国麻疹病例和暴发的全国最新情况。
MMWR Morb Mortal Wkly Rep. 2019 Oct 11;68(40):893-896. doi: 10.15585/mmwr.mm6840e2.
8
Machine Learning in Epidemiology and Health Outcomes Research.机器学习在流行病学和健康结果研究中的应用。
Annu Rev Public Health. 2020 Apr 2;41:21-36. doi: 10.1146/annurev-publhealth-040119-094437. Epub 2019 Oct 2.
9
Combining serological and contact data to derive target immunity levels for achieving and maintaining measles elimination.结合血清学和接触数据得出实现和维持消除麻疹目标免疫水平所需的参数。
BMC Med. 2019 Sep 25;17(1):180. doi: 10.1186/s12916-019-1413-7.
10
On the Brink: Why the U.S. is in Danger of Losing Measles Elimination Status.濒临边缘:为何美国有失去麻疹消除状态的危险。
Mo Med. 2019 Jul-Aug;116(4):260-264.