Pillonel Trestan, Bertelli Claire, Salamin Nicolas, Greub Gilbert
SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
Center for Research on Intracellular Bacteria, Institute of Microbiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland.
Int J Syst Evol Microbiol. 2015 Apr;65(Pt 4):1381-1393. doi: 10.1099/ijs.0.000090. Epub 2015 Jan 29.
Bacterial classification is a long-standing problem for taxonomists and species definition itself is constantly debated among specialists. The classification of strict intracellular bacteria such as members of the order Chlamydiales mainly relies on DNA- or protein-based phylogenetic reconstructions because these organisms exhibit few phenotypic differences and are difficult to culture. The availability of full genome sequences allows the comparison of the performance of conserved protein sequences to reconstruct Chlamydiales phylogeny. This approach permits the identification of markers that maximize the phylogenetic signal and the robustness of the inferred tree. In this study, a set of 424 core proteins was identified and concatenated to reconstruct a reference species tree. Although individual protein trees present variable topologies, we detected only few cases of incongruence with the reference species tree, which were due to horizontal gene transfers. Detailed analysis of the phylogenetic information of individual protein sequences (i) showed that phylogenies based on single randomly chosen core proteins are not reliable and (ii) led to the identification of twenty taxonomically highly reliable proteins, allowing the reconstruction of a robust tree close to the reference species tree. We recommend using these protein sequences to precisely classify newly discovered isolates at the family, genus and species levels.
细菌分类一直是分类学家面临的难题,物种定义本身也在专家之间不断引发争论。对于诸如衣原体目成员这样的严格细胞内细菌,其分类主要依赖基于DNA或蛋白质的系统发育重建,因为这些生物表现出很少的表型差异且难以培养。全基因组序列的可得性使得比较保守蛋白质序列在重建衣原体系统发育中的性能成为可能。这种方法能够识别出能最大化系统发育信号以及推断树稳健性的标记。在本研究中,一组424个核心蛋白被鉴定并串联起来以重建一个参考物种树。尽管单个蛋白质树呈现出可变的拓扑结构,但我们仅检测到少数与参考物种树不一致的情况,这些情况是由水平基因转移导致的。对单个蛋白质序列系统发育信息的详细分析表明:(i)基于单个随机选择的核心蛋白构建的系统发育不可靠;(ii)导致鉴定出20个分类学上高度可靠的蛋白质,从而能够重建一棵与参考物种树相近的稳健树。我们建议使用这些蛋白质序列在科、属和种水平上对新发现的分离株进行精确分类。