Rudi Knut, Zimonja Monika, Næs Tormod
Hedmark University College, 2306 Hamar, Norway.
MATFORSK Norwegian Food Research Institute, Osloveien 1, 1430 Ås, Norway.
Int J Syst Evol Microbiol. 2006 Jul;56(Pt 7):1565-1575. doi: 10.1099/ijs.0.63936-0.
Alignment-independent phylogenetic methods have interesting properties for global phylogenetic reconstructions, particularly with respect to speed and accuracy. Here, we present a novel multimer-based alignment-independent bilinear mathematical modelling (AIBIMM) approach for global 16S rRNA gene phylogenetic analyses. In AIBIMM, jackknife cross-validated principal component analyses (PCA) are used to explain the variance in nucleotide n-mer frequency data. We compared AIBIMM with alignment-based distance, maximum-parsimony and maximum-likelihood phylogenetic methods, analysing taxa belonging to the Proteobacteria (n=82), Actinobacteria (n=30) and Archaea (n=7). These analyses indicated an attraction between the Actinobacteria and Archaea for the traditional methods, with the two taxa Acidimicrobium and Rubrobacter at the root of the tree. AIBIMM, on the other hand, showed that the Actinobacteria was tightly clustered, with Acidimicrobium and Rubrobacter within a distinct subgroup of the Actinobacteria. The application of AIBIMM was further evaluated, analysing full-length 16S rRNA gene sequences for 2818 taxa representing the prokaryotic domains. We obtained a highly structured description of the prokaryote diversity. Sample-to-model (Si) distances were also determined for taxa included in our work. We determined Si distances for models of the six major subgroups of taxa detected in the global analyses, in addition to nested subgroups within the Alphaproteobacteria. The Si-distance evaluation showed a very good separation of the taxa within the models from those outside. We conclude that AIBIMM represents a novel phylogenetic framework suitable for accommodating the current exponential growth of 16S rRNA gene sequences in the public domain.
不依赖比对的系统发育方法在全球系统发育重建方面具有有趣的特性,特别是在速度和准确性方面。在此,我们提出了一种基于多聚体的新型不依赖比对的双线性数学建模(AIBIMM)方法,用于全球16S rRNA基因系统发育分析。在AIBIMM中,刀切交叉验证主成分分析(PCA)用于解释核苷酸n聚体频率数据中的方差。我们将AIBIMM与基于比对的距离法、最大简约法和最大似然法系统发育方法进行了比较,分析了属于变形菌门(n = 82)、放线菌门(n = 30)和古菌(n = 7)的分类单元。这些分析表明,对于传统方法,放线菌门和古菌之间存在吸引力,两个分类单元嗜酸微菌属和红杆菌属位于树的根部。另一方面,AIBIMM显示放线菌门紧密聚类,嗜酸微菌属和红杆菌属在放线菌门的一个独特亚组内。进一步评估了AIBIMM的应用,分析了代表原核生物域的2818个分类单元的全长16S rRNA基因序列。我们获得了原核生物多样性的高度结构化描述。还确定了我们工作中所包含分类单元的样本到模型(Si)距离。除了α-变形菌纲内的嵌套亚组外,我们还确定了在全球分析中检测到的六个主要分类单元亚组模型的Si距离。Si距离评估显示,模型内的分类单元与模型外的分类单元有很好的区分。我们得出结论,AIBIMM代表了一个新颖的系统发育框架,适用于适应公共领域中16S rRNA基因序列当前的指数增长。