Mostowy Rafal, Croucher Nicholas J, Andam Cheryl P, Corander Jukka, Hanage William P, Marttinen Pekka
Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London, United Kingdom.
Department of Epidemiology, Harvard TH Chan School of Public Health, Center for Communicable Disease Dynamics, Boston, MA.
Mol Biol Evol. 2017 May 1;34(5):1167-1182. doi: 10.1093/molbev/msx066.
Prokaryotic evolution is affected by horizontal transfer of genetic material through recombination. Inference of an evolutionary tree of bacteria thus relies on accurate identification of the population genetic structure and recombination-derived mosaicism. Rapidly growing databases represent a challenge for computational methods to detect recombinations in bacterial genomes. We introduce a novel algorithm called fastGEAR which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins. The algorithm detects both recent recombinations (affecting a few isolates) and ancestral recombinations between detected lineages (affecting entire lineages), thus providing insight into recombinations affecting deep branches of the phylogenetic tree. In simulations, fastGEAR had comparable power to detect recent recombinations and outstanding power to detect the ancestral ones, compared with state-of-the-art methods, often with a fraction of computational cost. We demonstrate the utility of the method by analyzing a collection of 616 whole-genomes of a recombinogenic pathogen Streptococcus pneumoniae, for which the method provided a high-resolution view of recombination across the genome. We examined in detail the penicillin-binding genes across the Streptococcus genus, demonstrating previously undetected genetic exchanges between different species at these three loci. Hence, fastGEAR can be readily applied to investigate mosaicism in bacterial genes across multiple species. Finally, fastGEAR correctly identified many known recombination hotspots and pointed to potential new ones. Matlab code and Linux/Windows executables are available at https://users.ics.aalto.fi/~pemartti/fastGEAR/ (last accessed February 6, 2017).
原核生物的进化受到通过重组进行的遗传物质水平转移的影响。因此,细菌进化树的推断依赖于群体遗传结构和重组衍生嵌合体的准确识别。快速增长的数据库对检测细菌基因组中重组的计算方法构成了挑战。我们引入了一种名为fastGEAR的新算法,该算法可识别不同微生物比对中的谱系,以及它们之间和来自外部来源的重组。该算法既能检测近期重组(影响少数分离株),也能检测已检测谱系之间的祖先重组(影响整个谱系),从而深入了解影响系统发育树深层分支的重组。在模拟中,与现有最先进的方法相比,fastGEAR检测近期重组的能力相当,检测祖先重组的能力出色,而且计算成本通常只是其一小部分。我们通过分析重组病原菌肺炎链球菌的616个全基因组集合,证明了该方法的实用性,该方法为整个基因组的重组提供了高分辨率视图。我们详细研究了链球菌属中的青霉素结合基因,证明了在这三个位点不同物种之间以前未检测到的基因交换。因此,fastGEAR可以很容易地应用于研究多个物种细菌基因中的嵌合现象。最后,fastGEAR正确识别了许多已知的重组热点,并指出了潜在的新热点。Matlab代码和Linux/Windows可执行文件可在https://users.ics.aalto.fi/~pemartti/fastGEAR/获取(最后访问时间为2017年2月6日)。