Suppr超能文献

基于K均值-长短期记忆网络的新型冠状病毒肺炎确诊病例数预测

Prediction of the number of COVID-19 confirmed cases based on K-means-LSTM.

作者信息

Vadyala Shashank Reddy, Betgeri Sai Nethra, Sherer Eric A, Amritphale Amod

机构信息

Department of Computational Analysis and Modeling, Louisiana Tech University, Ruston, LA, United States.

Department of Chemical Engineering, Louisiana Tech University, Ruston, LA, United States.

出版信息

Array (N Y). 2021 Sep;11:100085. doi: 10.1016/j.array.2021.100085. Epub 2021 Aug 21.

Abstract

COVID-19 is a pandemic disease that began to rapidly spread in the US, with the first case detected on January 19, 2020, in Washington State. March 9, 2020, and then quickly increased with total cases of 25,739 as of April 20, 2020. Although most people with coronavirus 81%, according to the U.S. Centers for Disease Control and Prevention (CDC), will have little to mild symptoms, others may rely on a ventilator to breathe or not at all. SEIR models have broad applicability in predicting the outcome of the population with a variety of diseases. However, many researchers use these models without validating the necessary hypotheses. Far too many researchers often "overfit" the data by using too many predictor variables and small sample sizes to create models. Models thus developed are unlikely to stand validity check on a separate group of population and regions. The researcher remains unaware that overfitting has occurred, without attempting such validation. In the paper, we present a combination algorithm that combines similar days features selection based on the region using Xgboost, K-Means, and long short-term memory (LSTM) neural networks to construct a prediction model (i.e., K-Means-LSTM) for short-term COVID-19 cases forecasting in Louisana state USA. The weighted k-means algorithm based on extreme gradient boosting is used to evaluate the similarity between the forecasts and past days. The results show that the method with K-Means-LSTM has a higher accuracy with an RMSE of 601.20 whereas the SEIR model with an RMSE of 3615.83.

摘要

新冠病毒病是一种在美国开始迅速传播的大流行病,2020年1月19日在华盛顿州检测到首例病例。2020年3月9日之后病例迅速增加,截至2020年4月20日共有25739例。美国疾病控制与预防中心(CDC)称,虽然感染冠状病毒的大多数人(81%)症状轻微,但其他人可能需要依靠呼吸机呼吸,甚至无法自主呼吸。SEIR模型在预测各种疾病人群的发病结果方面具有广泛适用性。然而,许多研究人员在使用这些模型时并未验证必要的假设。太多研究人员经常通过使用过多预测变量和小样本量来创建模型,从而使数据“过度拟合”。这样开发出来的模型不太可能在另一组人群和地区通过有效性检验。研究人员并未意识到已经发生过度拟合,也没有尝试进行这种验证。在本文中,我们提出了一种组合算法,该算法结合基于地区的相似日特征选择,使用Xgboost、K均值和长短期记忆(LSTM)神经网络来构建一个预测模型(即K均值 - LSTM),用于预测美国路易斯安那州新冠病毒病的短期病例。基于极端梯度提升的加权k均值算法用于评估预测值与过去日期之间的相似度。结果表明,K均值 - LSTM方法的准确率更高,均方根误差(RMSE)为601.20,而SEIR模型的RMSE为3615.83。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b526/8378999/f71b545fec4b/gr1_lrg.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验