Novartis Vaccines and Diagnostics, Siena, Italy.
PLoS Comput Biol. 2012;8(9):e1002668. doi: 10.1371/journal.pcbi.1002668. Epub 2012 Sep 6.
Advances in high-throughput DNA sequencing technologies have determined an explosion in the number of sequenced bacterial genomes. Comparative sequence analysis frequently reveals evidences of homologous recombination occurring with different mechanisms and rates in different species, but the large-scale use of computational methods to identify recombination events is hampered by their high computational costs. Here, we propose a new method to identify recombination events in large datasets of whole genome sequences. Using a filtering procedure of the gene conservation profiles of a test genome against a panel of strains, this algorithm identifies sets of contiguous genes acquired by homologous recombination. The locations of the recombination breakpoints are determined using a statistical test that is able to account for the differences in the natural rate of evolution between different genes. The algorithm was tested on a dataset of 75 genomes of Staphylococcus aureus and 50 genomes comprising different streptococcal species, and was able to detect intra-species recombination events in S. aureus and in Streptococcus pneumoniae. Furthermore, we found evidences of an inter-species exchange of genetic material between S. pneumoniae and Streptococcus mitis, a closely related commensal species that colonizes the same ecological niche. The method has been implemented in an R package, Reco, which is freely available from supplementary material, and provides a rapid screening tool to investigate recombination on a genome-wide scale from sequence data.
高通量 DNA 测序技术的进步使得测序的细菌基因组数量呈爆炸式增长。比较序列分析经常揭示出不同物种中存在不同机制和不同速率的同源重组的证据,但由于计算成本高,大规模使用计算方法来识别重组事件受到了阻碍。在这里,我们提出了一种新的方法来识别大规模全基因组序列数据集的重组事件。该算法使用测试基因组与一组菌株的基因保守性图谱的过滤过程,识别通过同源重组获得的连续基因集。通过使用能够解释不同基因之间自然进化率差异的统计检验来确定重组断点的位置。该算法在 75 个金黄色葡萄球菌基因组和 50 个包含不同链球菌种的基因组数据集上进行了测试,能够检测到金黄色葡萄球菌和肺炎链球菌中的种内重组事件。此外,我们还发现了肺炎链球菌和与其密切相关的共生种口腔链球菌之间存在遗传物质交换的证据,口腔链球菌定植在相同的生态位。该方法已在 Reco 中实现,这是一个 R 包,可从补充材料中免费获得,它提供了一种快速筛选工具,可从序列数据中对全基因组范围内的重组进行调查。