Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, 650 Charles E. Young Dr. South Los Angeles, CA 90095-1772, USA.
Department of Ecology and Evolutionary Biology, University of California, 610 Charles E. Young Drive South Los Angeles, CA 90095-1606, USA.
Syst Biol. 2018 May 1;67(3):384-399. doi: 10.1093/sysbio/syx066.
Phylogenetic comparative methods explore the relationships between quantitative traits adjusting for shared evolutionary history. This adjustment often occurs through a Brownian diffusion process along the branches of the phylogeny that generates model residuals or the traits themselves. For high-dimensional traits, inferring all pair-wise correlations within the multivariate diffusion is limiting. To circumvent this problem, we propose phylogenetic factor analysis (PFA) that assumes a small unknown number of independent evolutionary factors arise along the phylogeny and these factors generate clusters of dependent traits. Set in a Bayesian framework, PFA provides measures of uncertainty on the factor number and groupings, combines both continuous and discrete traits, integrates over missing measurements and incorporates phylogenetic uncertainty with the help of molecular sequences. We develop Gibbs samplers based on dynamic programming to estimate the PFA posterior distribution, over 3-fold faster than for multivariate diffusion and a further order-of-magnitude more efficiently in the presence of latent traits. We further propose a novel marginal likelihood estimator for previously impractical models with discrete data and find that PFA also provides a better fit than multivariate diffusion in evolutionary questions in columbine flower development, placental reproduction transitions and triggerfish fin morphometry.
系统发育比较方法通过沿着系统发育分支的布朗扩散过程来探索定量性状与共同进化历史之间的关系,从而对其进行调整。该过程会生成模型残差或性状本身。对于高维性状,推断多元扩散中的所有成对相关性是具有局限性的。为了解决这个问题,我们提出了系统发育因子分析(PFA),该方法假设沿系统发育产生了少量未知的独立进化因子,这些因子会产生相关性状的聚类。在贝叶斯框架下,PFA 提供了有关因子数量和分组的不确定性度量,它可以结合连续和离散性状,对缺失测量值进行积分,并借助分子序列来整合系统发育不确定性。我们基于动态编程开发了 Gibbs 抽样器来估计 PFA 后验分布,与多元扩散相比,其速度要快 3 倍,在存在潜在性状时,效率还要高出一个数量级。我们进一步为以前不实用的具有离散数据的模型提出了一种新的边际似然估计量,并且发现 PFA 在金翅雀花发育、胎盘繁殖转变和扳机鱼鳍形态计量学等进化问题上的拟合效果也优于多元扩散。