Hassler Gabriel W, Gallone Brigida, Aristide Leandro, Allen William L, Tolkoff Max R, Holbrook Andrew J, Baele Guy, Lemey Philippe, Suchard Marc A
Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States.
VIB-KU Leuven Center for Microbiology, Leuven, Belgium.
Methods Ecol Evol. 2022 Oct;13(10):2181-2197. doi: 10.1111/2041-210X.13920. Epub 2022 Jun 19.
Biological phenotypes are products of complex evolutionary processes in which selective forces influence multiple biological trait measurements in unknown ways. Phylogenetic comparative methods seek to disentangle these relationships across the evolutionary history of a group of organisms. Unfortunately, most existing methods fail to accommodate high-dimensional data with dozens or even thousands of observations per taxon. Phylogenetic factor analysis offers a solution to the challenge of dimensionality. However, scientists seeking to employ this modeling framework confront numerous modeling and implementation decisions, the details of which pose computational and replicability challenges.We develop new inference techniques that increase both the computational efficiency and modeling flexibility of phylogenetic factor analysis. To facilitate adoption of these new methods, we present a practical analysis plan that guides researchers through the web of complex modeling decisions. We codify this analysis plan in an automated pipeline that distills the potentially overwhelming array of decisions into a small handful of (typically binary) choices.We demonstrate the utility of these methods and analysis plan in four real-world problems of varying scales. Specifically, we study floral phenotype and pollination in columbines, domestication in industrial yeast, life history in mammals, and brain morphology in New World monkeys.General and impactful community employment of these methods requires a data scientific analysis plan that balances flexibility, speed and ease of use, while minimizing model and algorithm tuning. Even in the presence of non-trivial phylogenetic model constraints, we show that one may analytically address latent factor uncertainty in a way that (a) aids model flexibility, (b) accelerates computation (by as much as 500-fold) and (c) decreases required tuning. These efforts coalesce to create an accessible Bayesian approach to high-dimensional phylogenetic comparative methods on large trees.
生物学表型是复杂进化过程的产物,在这些过程中,选择力以未知方式影响多种生物学性状测量。系统发育比较方法试图理清一组生物体进化历史中的这些关系。不幸的是,大多数现有方法无法处理每个分类单元有数十甚至数千个观测值的高维数据。系统发育因子分析为维度挑战提供了一种解决方案。然而,寻求采用此建模框架的科学家面临众多建模和实施决策,其细节带来了计算和可重复性挑战。我们开发了新的推理技术,提高了系统发育因子分析的计算效率和建模灵活性。为便于采用这些新方法,我们提出了一个实用的分析计划,指导研究人员应对复杂的建模决策网络。我们将此分析计划编入一个自动化流程,将潜在的大量决策提炼为少数几个(通常是二元)选择。我们在四个不同规模的实际问题中展示了这些方法和分析计划的效用。具体而言,我们研究了耧斗菜的花表型和授粉、工业酵母的驯化、哺乳动物的生活史以及新大陆猴的脑形态。这些方法的广泛且有影响力的社区应用需要一个数据科学分析计划,该计划要在灵活性、速度和易用性之间取得平衡,同时尽量减少模型和算法调整。即使存在非平凡的系统发育模型约束,我们表明可以以一种有助于(a)提高模型灵活性、(b)加速计算(多达500倍)和(c)减少所需调整的方式来分析处理潜在因子的不确定性。这些努力共同促成了一种易于使用的贝叶斯方法,用于处理大树上的高维系统发育比较方法。