Department of Chemistry, University of California, Berkeley, CA 94720.
Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720.
Proc Natl Acad Sci U S A. 2017 Aug 29;114(35):9391-9396. doi: 10.1073/pnas.1711939114. Epub 2017 Aug 14.
Fungi belong to one of the largest and most diverse kingdoms of living organisms. The evolutionary kinship within a fungal population has so far been inferred mostly from the gene-information-based trees ("gene trees"), constructed commonly based on the degree of differences of proteins or DNA sequences of a small number of highly conserved genes common among the population by a multiple sequence alignment (MSA) method. Since each gene evolves under different evolutionary pressure and time scale, it has been known that one gene tree for a population may differ from other gene trees for the same population depending on the subjective selection of the genes. Within the last decade, a large number of whole-genome sequences of fungi have become publicly available, which represent, at present, the most fundamental and complete information about each fungal organism. This presents an opportunity to infer kinship among fungi using a whole-genome information-based tree ("genome tree"). The method we used allows comparison of whole-genome information without MSA, and is a variation of a computational algorithm developed to find semantic similarities or plagiarism in two books, where we represent whole-genomic information of an organism as a book of words without spaces. The genome tree reveals several significant and notable differences from the gene trees, and these differences invoke new discussions about alternative narratives for the evolution of some of the currently accepted fungal groups.
真菌属于最大和最多样化的生物王国之一。到目前为止,真菌群体内部的进化亲缘关系主要是从基于基因信息的树(“基因树”)推断出来的,这些树通常是根据群体中少数高度保守基因的蛋白质或 DNA 序列的差异程度构建的,这些基因通过多重序列比对(MSA)方法共同构建。由于每个基因在不同的进化压力和时间尺度下进化,因此人们已经知道,一个基因树对于一个群体可能与同一群体的其他基因树不同,这取决于基因的主观选择。在过去的十年中,大量真菌的全基因组序列已经公开,这些序列目前代表了每个真菌生物的最基本和完整的信息。这为使用基于全基因组信息的树(“基因组树”)推断真菌之间的亲缘关系提供了机会。我们使用的方法允许在不进行 MSA 的情况下比较全基因组信息,并且是一种计算算法的变体,该算法用于在两本书中查找语义相似性或抄袭,其中我们将生物体的全基因组信息表示为没有空格的单词书。基因组树揭示了与基因树的几个显著和值得注意的差异,这些差异引发了关于一些目前被接受的真菌群体进化的替代叙述的新讨论。