School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK.
Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, Coventry CV4 7AL, UK.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac761.
The ability to distinguish imported cases from locally acquired cases has important consequences for the selection of public health control strategies. Genomic data can be useful for this, for example, using a phylogeographic analysis in which genomic data from multiple locations are compared to determine likely migration events between locations. However, these methods typically require good samples of genomes from all locations, which is rarely available.
Here, we propose an alternative approach that only uses genomic data from a location of interest. By comparing each new case with previous cases from the same location, we are able to detect imported cases, as they have a different genealogical distribution than that of locally acquired cases. We show that, when variations in the size of the local population are accounted for, our method has good sensitivity and excellent specificity for the detection of imports. We applied our method to data simulated under the structured coalescent model and demonstrate relatively good performance even when the local population has the same size as the external population. Finally, we applied our method to several recent genomic datasets from both bacterial and viral pathogens, and show that it can, in a matter of seconds or minutes, deliver important insights on the number of imports to a geographically limited sample of a pathogen population.
The R package DetectImports is freely available from https://github.com/xavierdidelot/DetectImports.
Supplementary data are available at Bioinformatics online.
能够将输入病例与本地获得的病例区分开来,对于选择公共卫生控制策略具有重要意义。基因组数据在这方面可能很有用,例如使用系统地理学分析,比较来自多个地点的基因组数据以确定地点之间可能的迁移事件。然而,这些方法通常需要所有地点的基因组样本良好,而这很少能够实现。
在这里,我们提出了一种替代方法,仅使用感兴趣地点的基因组数据。通过将每个新病例与来自同一地点的先前病例进行比较,我们能够检测到输入病例,因为它们的谱系分布与本地获得的病例不同。我们表明,当考虑到本地人口数量的变化时,我们的方法对于检测输入具有良好的灵敏度和极佳的特异性。我们将我们的方法应用于基于结构合并模型模拟的数据,并证明即使本地人口与外部人口大小相同时,该方法也具有相对较好的性能。最后,我们将我们的方法应用于来自细菌和病毒病原体的几个最近的基因组数据集,并表明它可以在几秒钟或几分钟内,为病原体群体的地理限制样本中输入的数量提供重要的见解。
R 包 DetectImports 可从 https://github.com/xavierdidelot/DetectImports 免费获得。
补充数据可在 Bioinformatics 在线获得。