Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), Faculty of Science, University of Helsinki, FI-00014 Helsinki, Finland.
Department of Computer Science, Aalto University, Espoo, FI-00014, Finland.
Nucleic Acids Res. 2019 Oct 10;47(18):e112. doi: 10.1093/nar/gkz656.
Covariance-based discovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level covariation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which adjusts for the phylogenetic signal in the data without requiring an explicit phylogenetic tree. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Simulations demonstrate the usefulness of our method and give some insight to when this type of analysis is most likely to be successful. Application of the method to large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data.
基于协方差的共选择压力或上位性下多态性的发现最近在群体基因组学中受到了相当多的关注。在染色体上等位基因的群体水平协变的统计建模和对多态性对之间的依赖性的无模型检验都已成功揭示了细菌群体中选择模式。在这里,我们引入了一种无模型的方法 SpydrPick,其计算效率能够在许多细菌的泛基因组规模上进行分析。SpydrPick 包含了一种有效的群体结构校正方法,它可以在不需要显式系统发育树的情况下调整数据中的系统发育信号。我们还引入了一种新的结果可视化类型,类似于全基因组关联研究中使用的曼哈顿图,这使得快速探索共进化的识别信号成为可能。模拟结果表明了我们方法的有效性,并给出了这种类型的分析最有可能成功的一些见解。该方法应用于两种主要人类病原体肺炎链球菌和脑膜炎奈瑟菌的大型群体基因组数据集,揭示了与毒力和抗生素耐药性相关的共选择的先前确定的和新的潜在靶点,突出了这种方法在缺乏表型数据的情况下驱动分子发现的潜力。