Institute for Molecular Bioscience.
University of Queensland.
Brief Bioinform. 2019 Mar 22;20(2):426-435. doi: 10.1093/bib/bbx067.
We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.
我们正处于高通量技术应用所产生的大量序列数据的浪潮之中,这也促使我们对基因组在个体和生物圈中如何演化有了根本性的新认识。系统发生基因组推断的工作流程必须能够处理不仅比以前更大的数据,而且还常常更容易出错,或者可能组装错误,或者根本没有组装。此外,微生物、病毒和质粒的基因组不仅通过带有修饰的树状进化,而且还通过整合外源 DNA 片段进行进化。因此,下一代系统发生基因组学必须解决计算可扩展性问题,同时重新思考直系同源物、多序列比对以及树的推断和比较的性质。新的系统发生基因组学工作流程已经开始基于所谓的无比对(AF)方法形成。在这里,我们回顾了用于基因组进化的层次(垂直)和网状(横向)成分的 AF 系统发生学的概念基础,重点介绍了基于 k-mer 的方法。我们反思了哪些方法是成功的,以及哪些方面需要进一步发展。