Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics; The University of Queensland; Brisbane, QLD, Australia.
RNA Biol. 2014;11(3):176-85. doi: 10.4161/rna.27505. Epub 2014 Jan 14.
From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D 2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today.
从 1971 年到 1985 年,卡尔·沃斯(Carl Woese)和同事们从 400 多种生物中生成了 16S/18S rRNA 的寡核苷酸目录。利用这些不完整和不完善的数据,卡尔和他的同事们对翻译机构中大型 RNA 成分的结构、功能和进化有了前所未有的认识。他们发现了生命的第三个领域,揭示了细菌的系统发育主干(及其局限性),划定了分类群,并探索了微生物进化的速度和模式。为了使这些发现经得起时间的考验,寡核苷酸目录必须携带重要的系统发育信号;因此,鉴于当前基于 k-mer 的无比对系统发育学的兴趣,它们需要重新检查。在这里,我们考虑了分子系统发育学早期阶段的目标、成功和局限性。我们从 16S/18S rRNA 序列计算生成寡核苷酸集(电子目录),基于 D 2 统计计算它们之间的成对距离,计算距离树,并将其性能与基于比对和 k-mer 的树进行比较。尽管这些目录本身已经被全长序列所取代,但计算分子生物学发展的这一阶段今天仍然对我们有启示。