Whitlock Alexander O B, Bird Brian H, Ghersi Bruno, Davison Andrew J, Hughes Joseph, Nichols Jenna, Vučak Matej, Amara Emmanuel, Bangura James, Lavalie Edwin G, Kanu Marilyn C, Kanu Osman T, Sjodin Anna, Remien Christopher H, Nuismer Scott L
Department of Biological Sciences, University of Idaho, Moscow, ID, USA.
One Health Institute, School of Veterinary Medicine, University of California, Davis, Davis, CA, USA.
R Soc Open Sci. 2023 Mar 22;10(3):221503. doi: 10.1098/rsos.221503. eCollection 2023 Mar.
The rate at which zoonotic viruses spill over into the human population varies significantly over space and time. Remarkably, we do not yet know how much of this variation is attributable to genetic variation within viral populations. This gap in understanding arises because we lack methods of genetic analysis that can be easily applied to zoonotic viruses, where the number of available viral sequences is often limited, and opportunistic sampling introduces significant population stratification. Here, we explore the feasibility of using patterns of shared ancestry to correct for population stratification, enabling genome-wide association methods to identify genetic substitutions associated with spillover into the human population. Using a combination of phylogenetically structured simulations and Lassa virus sequences collected from humans and rodents in Sierra Leone, we demonstrate that existing methods do not fully correct for stratification, leading to elevated error rates. We also demonstrate, however, that the Type I error rate can be substantially reduced by confining the analysis to a less-stratified region of the phylogeny, even in an already-small dataset. Using this method, we detect two candidate single-nucleotide polymorphisms associated with spillover in the Lassa virus polymerase gene and provide generalized recommendations for the collection and analysis of zoonotic viruses.
人畜共患病毒传播到人类群体中的速率在空间和时间上有显著差异。值得注意的是,我们尚不清楚这种差异中有多少可归因于病毒群体内部的基因变异。之所以存在这种认知差距,是因为我们缺乏易于应用于人畜共患病毒的基因分析方法,这类病毒的可用病毒序列数量往往有限,而且机会性抽样会导致显著的群体分层。在此,我们探讨利用共同祖先模式来校正群体分层的可行性,使全基因组关联方法能够识别与传播到人类群体相关的基因替代。通过系统发育结构模拟与从塞拉利昂的人类和啮齿动物中收集的拉沙病毒序列相结合,我们证明现有方法不能完全校正分层,导致错误率升高。然而,我们也证明,即使在数据集已经很小的情况下,将分析局限于系统发育中分层较少的区域,也可以大幅降低I型错误率。使用这种方法,我们在拉沙病毒聚合酶基因中检测到两个与传播相关的候选单核苷酸多态性,并为人畜共患病毒的收集和分析提供了一般性建议。