University of Washington Department of Statistics, Box 354322, Seattle, WA 98195-4322, USA.
KBR NASA Ames Research Center, PO Box 1, Moffett Field, CA 94035-1000.
Biostatistics. 2024 Jul 1;25(3):786-800. doi: 10.1093/biostatistics/kxad025.
Microbiome scientists critically need modern tools to explore and analyze microbial evolution. Often this involves studying the evolution of microbial genomes as a whole. However, different genes in a single genome can be subject to different evolutionary pressures, which can result in distinct gene-level evolutionary histories. To address this challenge, we propose to treat estimated gene-level phylogenies as data objects, and present an interactive method for the analysis of a collection of gene phylogenies. We use a local linear approximation of phylogenetic tree space to visualize estimated gene trees as points in low-dimensional Euclidean space, and address important practical limitations of existing related approaches, allowing an intuitive visualization of complex data objects. We demonstrate the utility of our proposed approach through microbial data analyses, including by identifying outlying gene histories in strains of Prevotella, and by contrasting Streptococcus phylogenies estimated using different gene sets. Our method is available as an open-source R package, and assists with estimating, visualizing, and interacting with a collection of bacterial gene phylogenies.
微生物组科学家迫切需要现代工具来探索和分析微生物进化。这通常涉及研究整个微生物基因组的进化。然而,单个基因组中的不同基因可能受到不同的进化压力,这可能导致不同的基因水平进化历史。为了解决这个挑战,我们建议将估计的基因水平系统发育树视为数据对象,并提出一种用于分析基因系统发育树集合的交互方法。我们使用系统发育树空间的局部线性逼近将估计的基因树可视化为低维欧几里得空间中的点,并解决现有相关方法的重要实际限制,从而直观地可视化复杂的数据对象。我们通过微生物数据分析证明了我们提出的方法的实用性,包括通过识别普雷沃氏菌菌株中异常的基因历史,以及通过对比使用不同基因集估计的链球菌系统发育树。我们的方法作为一个开源的 R 包提供,并协助估计、可视化和与一组细菌基因系统发育树进行交互。