将前瞻性树状时间扫描统计方法应用于基因组监测数据,以检测纽约市新出现的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)变体和沙门氏菌病聚集情况。

Applying prospective tree-temporal scan statistics to genomic surveillance data to detect emerging SARS-CoV-2 variants and salmonellosis clusters in New York City.

作者信息

Greene Sharon K, Latash Julia, Peterson Eric R, Levin-Rector Alison, Luoma Elizabeth, Wang Jade C, Bernard Kevin, Olsen Aaron, Li Lan, Waechter HaeNa, Mattias Aria, Rohrer Rebecca, Kulldorff Martin

机构信息

Division of Disease Control, New York City Department of Health and Mental Hygiene, Long Island City, NY, United States.

Independent Biostatistician, Ashford, CT, United States.

出版信息

Int J Epidemiol. 2025 Feb 16;54(2). doi: 10.1093/ije/dyaf032.

Abstract

BACKGROUND

The detection of communicable disease clusters in genomic surveillance data typically involves the application of rule-based signaling criteria, which can be arbitrary. In contrast, scan statistics that are used for spatiotemporal cluster detection can flexibly scan in calendar time, and scan statistics that are used for pharmacovigilance can flexibly scan along hierarchical tree structures that are based on diagnosis codes.

METHODS

New York City (NYC) Health Department staff applied tree-temporal scan statistics prospectively to genomic surveillance data with a hierarchical nomenclature for COVID-19 and salmonellosis cases that were diagnosed among NYC residents. We searched weekly for recent case increases at any granularity, from large phylogenetic branches to small groups of indistinguishable isolates. Using free and open-source TreeScan software, we looked for emerging SARS-CoV-2 variants based on Pango lineages during August 2021-November 2023 and emerging clusters of Salmonella isolates based on allele codes during November 2022-November 2023.

RESULTS

The SARS-CoV-2 Omicron subvariant EG.5.1 first signaled as locally emerging on 22 June 2023, 7 weeks before the World Health Organization designated it as a variant of interest. During 1 year of salmonellosis analyses, TreeScan detected 15 credible clusters that were worth investigating for common exposures and two data-quality issues for correction.

CONCLUSION

A challenge was the maintenance of timely and specific lineage assignments, and a limitation was that genetic distances between tree nodes were not considered. By automatically sifting through genomic data and generating ranked shortlists of nodes with statistically unusual recent case increases, TreeScan assisted in detecting emerging variants and clusters of communicable diseases and in prioritizing them for investigation.

摘要

背景

在基因组监测数据中检测传染病聚集通常涉及应用基于规则的信号标准,而这些标准可能具有随意性。相比之下,用于时空聚集检测的扫描统计可以在日历时间内灵活扫描,用于药物警戒的扫描统计可以沿着基于诊断代码的层次树结构灵活扫描。

方法

纽约市卫生部门工作人员前瞻性地将树状时间扫描统计应用于具有分层命名法的基因组监测数据,该数据涉及纽约市居民中诊断出的新冠病毒病和沙门氏菌病病例。我们每周在任何粒度下搜索近期病例增加情况,从大的系统发育分支到无法区分的分离株小群体。使用免费开源的TreeScan软件,我们在2021年8月至2023年11月期间基于Pango谱系寻找新出现的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)变体,并在2022年11月至2023年11月期间基于等位基因代码寻找沙门氏菌分离株的新出现聚集。

结果

SARS-CoV-2奥密克戎亚变体EG.5.1于2023年6月22日首次显示为本地出现,比世界卫生组织将其指定为关注变体提前了7周。在1年的沙门氏菌病分析中,TreeScan检测到15个值得调查共同暴露情况的可信聚集以及两个需要纠正的数据质量问题。

结论

一个挑战是保持及时和特定的谱系分配,一个局限是未考虑树节点之间的遗传距离。通过自动筛选基因组数据并生成近期病例增加具有统计学异常的节点排名候选清单,TreeScan有助于检测传染病的新出现变体和聚集,并对其进行调查优先级排序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dd8/11984460/e82e17e6b49f/dyaf032f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索