Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.
PLoS One. 2021 Feb 8;16(2):e0246772. doi: 10.1371/journal.pone.0246772. eCollection 2021.
Since the beginning of the coronavirus disease 2019 (COVID-19) pandemic, daily counts of confirmed cases and deaths have been publicly reported in real-time to control the virus spread. However, substantial undocumented infections have obscured the true size of the currently infected population, which is arguably the most critical number for public health policy decisions. We developed a machine learning framework to estimate time courses of actual new COVID-19 cases and current infections in all 50 U.S. states and the 50 most infected countries from reported test results and deaths. Using published epidemiological parameters, our algorithm optimized slowly varying daily ascertainment rates and a time course of currently infected cases each day. Severe under-ascertainment of COVID-19 cases was found to be universal across U.S. states and countries worldwide. In 25 out of the 50 countries, actual cumulative cases were estimated to be 5-20 times greater than the confirmed cases. Our estimates of cumulative incidence were in line with the existing seroprevalence rates in 46 U.S. states. Our framework projected for countries like Belgium, Brazil, and the U.S. that ~10% of the population has been infected once. In the U.S. states like Louisiana, Georgia, and Florida, more than 4% of the population was estimated to be currently infected, as of September 3, 2020, while in New York this fraction is 0.12%. The estimation of the actual fraction of currently infected people is crucial for any definition of public health policies, which up to this point may have been misguided by the reliance on confirmed cases.
自 2019 年冠状病毒病(COVID-19)大流行开始以来,每日确诊病例和死亡人数已实时公开报告,以控制病毒传播。然而,大量未记录的感染病例使当前感染人群的真实规模变得模糊不清,而这可能是公共卫生政策决策的最关键数字。我们开发了一种机器学习框架,根据报告的检测结果和死亡人数,估算美国 50 个州和全球 50 个感染最严重的国家的实际新 COVID-19 病例和当前感染病例的时间进程。使用已发表的流行病学参数,我们的算法每天优化缓慢变化的每日确定率和当前感染病例的时间进程。发现美国各州和全球各国普遍存在对 COVID-19 病例的严重低估。在 50 个国家中的 25 个国家中,实际累计病例估计比确诊病例大 5-20 倍。我们对累积发病率的估计与 46 个美国州的现有血清流行率相符。我们的框架预测,比利时、巴西和美国等国家约有 10%的人口已被感染一次。截至 2020 年 9 月 3 日,路易斯安那州、佐治亚州和佛罗里达州等美国州估计有超过 4%的人口目前被感染,而纽约这一比例为 0.12%。对当前感染人数的实际比例的估计对于任何公共卫生政策的定义都至关重要,到目前为止,这些政策可能由于依赖确诊病例而被误导。