Department of Public Health and Prevention Sciences, School of Health Sciences, Baldwin Wallace University, Berea, OH 44017, USA.
Department of Geography, University of California Santa Barbara (UCSB), Santa Barbara, CA 93106, USA.
Int J Environ Res Public Health. 2020 Jun 12;17(12):4204. doi: 10.3390/ijerph17124204.
Prediction of the COVID-19 incidence rate is a matter of global importance, particularly in the United States. As of 4 June 2020, more than 1.8 million confirmed cases and over 108 thousand deaths have been reported in this country. Few studies have examined nationwide modeling of COVID-19 incidence in the United States particularly using machine-learning algorithms. Thus, we collected and prepared a database of 57 candidate explanatory variables to examine the performance of multilayer perceptron (MLP) neural network in predicting the cumulative COVID-19 incidence rates across the continental United States. Our results indicated that a single-hidden-layer MLP could explain almost 65% of the correlation with ground truth for the holdout samples. Sensitivity analysis conducted on this model showed that the age-adjusted mortality rates of ischemic heart disease, pancreatic cancer, and leukemia, together with two socioeconomic and environmental factors (median household income and total precipitation), are among the most substantial factors for predicting COVID-19 incidence rates. Moreover, results of the logistic regression model indicated that these variables could explain the presence/absence of the hotspots of disease incidence that were identified by Getis-Ord Gi* ( < 0.05) in a geographic information system environment. The findings may provide useful insights for public health decision makers regarding the influence of potential risk factors associated with the COVID-19 incidence at the county level.
预测 COVID-19 发病率是一个具有全球重要性的问题,尤其是在美国。截至 2020 年 6 月 4 日,该国报告的确诊病例超过 180 万例,死亡病例超过 10.8 万例。很少有研究使用机器学习算法对美国全国范围内的 COVID-19 发病率进行建模。因此,我们收集并准备了一个包含 57 个候选解释变量的数据库,以检查多层感知器 (MLP) 神经网络在预测整个美国大陆 COVID-19 累积发病率方面的性能。我们的结果表明,单个隐藏层 MLP 可以解释留一法样本中与真实数据之间的相关性的近 65%。对该模型进行的敏感性分析表明,调整年龄后的缺血性心脏病、胰腺癌和白血病的死亡率,以及两个社会经济和环境因素(家庭中位数收入和总降水量),是预测 COVID-19 发病率的最重要因素之一。此外,逻辑回归模型的结果表明,这些变量可以解释地理信息系统环境中 Getis-Ord Gi*(<0.05)识别出的疾病发病率热点的存在/不存在。这些发现可能为公共卫生决策者提供有关与 COVID-19 发病率相关的潜在风险因素在县一级的影响的有用见解。