Moreno Gage K, Brock-Fisher Taylor, Krasilnikova Lydia A, Schaffner Steve F, Burns Meagan, Casiello Carolyn E, Messer Katelyn S, Petros Brittany, Specht Ivan, DeRuff Katherine C, Siddle Katherine J, Loreth Christine, Fitzgerald Nicholas A, Rooke Heather M, Gabriel Stacey B, Smole Sandra, Wohl Shirlee, Park Daniel J, Madoff Lawrence C, Brown Catherine M, MacInnis Bronwyn L, Sabeti Pardis C
Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
medRxiv. 2025 Apr 6:2025.04.04.25324273. doi: 10.1101/2025.04.04.25324273.
Despite intensive study, gaps remain in our understanding of SARS-CoV-2 transmission patterns during the COVID-19 pandemic, in part due to limited contextual metadata accompanying most large genomic surveillance datasets. We analyzed over 130,000 SARS-CoV-2 genomes, over 85,000 with matched epidemiological data, collected in Massachusetts from November 2021 to January 2023, to investigate viral transmission dynamics at high resolution. The data were drawn from diagnostic testing at >600 facilities representing schools, workplaces, public testing, and other sectors, and encompass the emergence of six major viral lineages, each representing a new outbreak. We found urban areas as key hubs for new lineage introduction and interurban transmission as facilitating spread throughout the state. Young adults, especially those on college campuses, served as early indicators of emerging lineage dominance. Resident-aged populations in college campuses and nursing homes exhibited a higher likelihood of being linked to within-facility transmission, while staff-aged at those facilities were more linked to their surrounding community. Individuals with recent vaccine doses, including boosters, had a lower likelihood of initiating transmission. This dataset shows the value of linking genomic and epidemiologic data at scale for higher resolution insights into viral dynamics and their implication for public health strategy.
尽管进行了深入研究,但在新冠疫情期间,我们对新冠病毒传播模式的理解仍存在差距,部分原因是大多数大型基因组监测数据集所附带的背景元数据有限。我们分析了2021年11月至2023年1月在马萨诸塞州收集的超过130,000个新冠病毒基因组,其中超过85,000个带有匹配的流行病学数据,以高分辨率调查病毒传播动态。这些数据来自600多个代表学校、工作场所、公共检测和其他部门的设施的诊断检测,涵盖了六个主要病毒谱系的出现,每个谱系代表一次新的疫情爆发。我们发现城市地区是新谱系引入的关键枢纽,城市间传播促进了病毒在全州的传播。年轻人,尤其是大学校园里的年轻人,是新兴谱系占主导地位的早期指标。大学校园和养老院中的老年居民与设施内传播的关联可能性更高,而这些设施中的工作人员与周边社区的联系更为紧密。近期接种过疫苗(包括加强针)的个体引发传播的可能性较低。这个数据集显示了大规模关联基因组和流行病学数据对于更深入了解病毒动态及其对公共卫生策略的影响的价值。