Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843, USA.
BMC Bioinformatics. 2018 Nov 22;19(1):450. doi: 10.1186/s12859-018-2456-z.
Phylogeny estimation for bacteria is likely to reflect their true evolutionary histories only if they are highly clonal. However, recombination events could occur during evolution for some species. The reconstruction of phylogenetic trees from an alignment without considering recombination could be misleading, since the relationships among strains in some parts of the genome might be different than in others. Using a single, global tree can create the appearance of homoplasy in recombined regions. Hence, the identification of recombination breakpoints is essential to better understand the evolutionary relationships of isolates among a bacterial population.
Previously, we have developed a method (called ACR) to detect potential breakpoints in an alignment by evaluating compatibility of polymorphic sites in a sliding window. To assess the statistical significance of candidate breakpoints, we propose an extension of the algorithm (ptACR) that applies a permutation test to generate a null distribution for comparing the average local compatibility. The performance of ptACR is evaluated on both simulated and empirical datasets. ptACR is shown to have similar sensitivity (true positive rate) but a lower false positive rate and higher F1 score compared to basic ACR. When used to analyze a collection of clinical isolates of Staphylococcus aureus, ptACR finds clear evidence of recombination events in this bacterial pathogen, and is able to identify statistically significant boundaries of chromosomal regions with distinct phylogenies.
ptACR is an accurate and efficient method for identifying genomic regions affected by recombination in bacterial genomes.
只有当细菌高度克隆时,对其进行系统发育估计才可能反映其真实的进化历史。然而,对于某些物种,进化过程中可能会发生重组事件。如果在不考虑重组的情况下,从比对中重建系统发育树,可能会产生误导,因为在基因组的某些部分中菌株之间的关系可能与其他部分不同。使用单个全局树可能会在重组区域中产生同形性的外观。因此,识别重组断点对于更好地了解细菌种群中分离株之间的进化关系至关重要。
以前,我们开发了一种方法(称为 ACR),通过评估滑动窗口中多态性位点的兼容性来检测比对中潜在的断点。为了评估候选断点的统计显著性,我们提出了算法的扩展(ptACR),该算法应用置换检验来生成用于比较平均局部兼容性的零分布。在模拟和经验数据集上评估了 ptACR 的性能。与基本 ACR 相比,ptACR 具有相似的灵敏度(真阳性率),但假阳性率更低,F1 评分更高。当用于分析一组金黄色葡萄球菌的临床分离株时,ptACR 在这种细菌病原体中发现了明显的重组事件的证据,并能够识别具有不同系统发育的染色体区域的统计学上显著边界。
ptACR 是一种准确有效的方法,可用于识别细菌基因组中受重组影响的基因组区域。