Suppr超能文献

在具有许多不完整测量值的大型树上推断表型性状进化

Inferring Phenotypic Trait Evolution on Large Trees With Many Incomplete Measurements.

作者信息

Hassler Gabriel, Tolkoff Max R, Allen William L, Ho Lam Si Tung, Lemey Philippe, Suchard Marc A

机构信息

Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States.

Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States.

出版信息

J Am Stat Assoc. 2022;117(538):678-692. doi: 10.1080/01621459.2020.1799812. Epub 2020 Sep 16.

Abstract

Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. An additional challenge arises as obtaining a full suite of measurements becomes increasingly difficult with increasing taxa. This generally necessitates data imputation or integration, and existing control techniques typically scale poorly as the number of taxa increases. We propose an inference technique that integrates out missing measurements analytically and scales linearly with the number of taxa by using a post-order traversal algorithm under a multivariate Brownian diffusion (MBD) model to characterize trait evolution. We further exploit this technique to extend the MBD model to account for sampling error or non-heritable residual variance. We test these methods to examine mammalian life history traits, prokaryotic genomic and phenotypic traits, and HIV infection traits. We find computational efficiency increases that top two orders-of-magnitude over current best practices. While we focus on the utility of this algorithm in phylogenetic comparative methods, our approach generalizes to solve long-standing challenges in computing the likelihood for matrix-normal and multivariate normal distributions with missing data at scale.

摘要

比较生物学家通常对推断众多相关分类群中多个生物特征之间的协变关系感兴趣。为了恰当地研究这些关系,我们必须控制分类群的共同进化历史,以避免错误推断。随着分类群数量的增加,获取一整套测量数据变得越来越困难,这又带来了一个额外的挑战。这通常需要进行数据插补或整合,而现有的控制技术通常随着分类群数量的增加而扩展性较差。我们提出了一种推断技术,该技术通过在多元布朗扩散(MBD)模型下使用后序遍历算法来解析地整合缺失的测量数据,并随着分类群数量线性扩展,以表征性状进化。我们进一步利用该技术扩展MBD模型,以考虑抽样误差或非遗传残差方差。我们测试了这些方法,以研究哺乳动物的生活史特征、原核生物的基因组和表型特征以及HIV感染特征。我们发现计算效率比当前的最佳实践提高了两个数量级。虽然我们专注于该算法在系统发育比较方法中的效用,但我们的方法具有通用性,可解决大规模计算具有缺失数据的矩阵正态分布和多元正态分布的似然性方面长期存在的挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc9/9438787/e323efafe9f4/nihms-1643644-f0001.jpg

相似文献

7
Incomplete taxon sampling is not a problem for phylogenetic inference.不完全的分类群抽样对于系统发育推断而言并非问题。
Proc Natl Acad Sci U S A. 2001 Sep 11;98(19):10751-6. doi: 10.1073/pnas.191248498. Epub 2001 Aug 28.

引用本文的文献

2
Leveraging graphical model techniques to study evolution on phylogenetic networks.利用图形模型技术研究系统发育网络上的进化。
Philos Trans R Soc Lond B Biol Sci. 2025 Feb 13;380(1919):20230310. doi: 10.1098/rstb.2023.0310. Epub 2025 Feb 20.
3
Data integration in Bayesian phylogenetics.贝叶斯系统发育学中的数据整合。
Annu Rev Stat Appl. 2023;10:353-377. doi: 10.1146/annurev-statistics-033021-112532. Epub 2022 Sep 28.
9
Scalable Bayesian phylogenetics.可扩展的贝叶斯系统发生学。
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210242. doi: 10.1098/rstb.2021.0242. Epub 2022 Aug 22.

本文引用的文献

1
Relaxed Random Walks at Scale.大规模松弛随机游走。
Syst Biol. 2021 Feb 10;70(2):258-267. doi: 10.1093/sysbio/syaa056.
3
Automatic generation of evolutionary hypotheses using mixed Gaussian phylogenetic models.使用混合高斯系统发育模型自动生成进化假说。
Proc Natl Acad Sci U S A. 2019 Aug 20;116(34):16921-16926. doi: 10.1073/pnas.1813823116. Epub 2019 Aug 2.
9
Phylogenetic Factor Analysis.系统发育因子分析。
Syst Biol. 2018 May 1;67(3):384-399. doi: 10.1093/sysbio/syx066.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验