• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

主成分分析与系统发育树空间中弗雷歇均值的轨迹

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

作者信息

Nye Tom M W, Tang Xiaoxian, Weyenberg Grady, Yoshida Ruriko

机构信息

School of Mathematics and Statistics, Newcastle University, Newcastle upon Tyne NE1 7RU,

Department of Mathematics, Texas A&M University, College Station, Texas 77843,

出版信息

Biometrika. 2017 Dec;104(4):901-922. doi: 10.1093/biomet/asx047. Epub 2017 Sep 27.

DOI:10.1093/biomet/asx047
PMID:29422694
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5793493/
Abstract

Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text]th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text], and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

摘要

进化关系由系统发育树表示,对基因序列进行系统发育分析通常会产生这些树的集合,分析中的每个基因对应一棵。由于可能的树空间具有多维性,对树样本进行分析很困难。在欧几里得空间中,主成分分析是一种将高维数据降维为低维表示的常用方法,该低维表示保留了样本的大部分结构。然而,固定物种集上所有系统发育树的空间并不构成欧几里得向量空间,因此需要适用于树空间的方法。先前的工作引入了该空间中主测地线的概念,类似于第一主成分。在此,我们为树空间提出一种几何对象,类似于欧几里得空间中的第[公式:见原文]主成分:当权重在[公式:见原文] - 单纯形上变化时,[公式:见原文]个顶点树的加权弗雷歇均值的轨迹。我们建立了这些对象的一些基本性质,特别表明它们的维度为[公式:见原文],并提出了投影到这些曲面上以及找到与树样本相关的主轨迹的算法。模拟研究表明这些算法表现良好,对分别包含顶复门和非洲腔棘鱼基因组的两个数据集的分析揭示了第二主成分中的重要结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/91b10a2387eb/asx047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/a6dc3996ac83/asx047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/4be5bfe59824/asx047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/4fd4b9100cf0/asx047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/a328058c237f/asx047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/91b10a2387eb/asx047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/a6dc3996ac83/asx047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/4be5bfe59824/asx047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/4fd4b9100cf0/asx047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/a328058c237f/asx047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24cd/5793493/91b10a2387eb/asx047f5.jpg

相似文献

1
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.主成分分析与系统发育树空间中弗雷歇均值的轨迹
Biometrika. 2017 Dec;104(4):901-922. doi: 10.1093/biomet/asx047. Epub 2017 Sep 27.
2
Tropical principal component analysis on the space of phylogenetic trees.基于系统发育树空间的热带主成分分析。
Bioinformatics. 2020 Nov 1;36(17):4590-4598. doi: 10.1093/bioinformatics/btaa564.
3
Bounds for phylogenetic network space metrics.系统发育网络空间度量的边界。
J Math Biol. 2018 Apr;76(5):1229-1248. doi: 10.1007/s00285-017-1171-0. Epub 2017 Aug 23.
4
Ranked Subtree Prune and Regraft.最优子树剪枝和嫁接。
Bull Math Biol. 2024 Jan 31;86(3):24. doi: 10.1007/s11538-023-01244-2.
5
An efficient algorithm for testing the compatibility of phylogenies with nested taxa.一种用于测试系统发育与嵌套分类群兼容性的高效算法。
Algorithms Mol Biol. 2017 Mar 16;12:7. doi: 10.1186/s13015-017-0099-7. eCollection 2017.
6
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models.在重复-缺失和重复-缺失-转移模型中计算和采样基因家族进化历史。
J Math Biol. 2020 Apr;80(5):1353-1388. doi: 10.1007/s00285-019-01465-x. Epub 2020 Feb 15.
7
Testing the agreement of trees with internal labels.测试带有内部标签的树的一致性。
Algorithms Mol Biol. 2021 Dec 4;16(1):22. doi: 10.1186/s13015-021-00201-9.
8
An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace.一种在系统发育树空间中构建主测地线的算法。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):304-15. doi: 10.1109/TCBB.2014.2309599.
9
Tree-space statistics and approximations for large-scale analysis of anatomical trees.用于解剖树大规模分析的树形空间统计与近似方法
Inf Process Med Imaging. 2013;23:74-85. doi: 10.1007/978-3-642-38868-2_7.
10
Discrete coalescent trees.离散融合树。
J Math Biol. 2021 Nov 5;83(5):60. doi: 10.1007/s00285-021-01685-0.

引用本文的文献

1
Spaces of ranked tree-child networks.排序树子网络的空间。
J Math Biol. 2025 Sep 2;91(3):32. doi: 10.1007/s00285-025-02265-2.
2
Distances Between Extension Spaces of Phylogenetic Trees.系统发育树扩展空间之间的距离。
IEEE Trans Comput Biol Bioinform. 2025 Mar-Apr;22(2):614-627. doi: 10.1109/TCBBIO.2025.3526422.
3
The Space of Equidistant Phylogenetic Cactuses.等距系统发育仙人掌空间

本文引用的文献

1
Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace.规范化比尔勒-霍姆斯-沃格特曼树空间中的核。
IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1359-1365. doi: 10.1109/TCBB.2016.2565475. Epub 2016 May 10.
2
Clustering Genes of Common Evolutionary History.具有共同进化历史的基因聚类
Mol Biol Evol. 2016 Jun;33(6):1590-605. doi: 10.1093/molbev/msw038. Epub 2016 Feb 17.
3
An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace.一种在系统发育树空间中构建主测地线的算法。
Ann Comb. 2024;28(1):1-32. doi: 10.1007/s00026-023-00656-0. Epub 2023 Jun 9.
4
Tukey's Depth for Object Data.对象数据的图基深度
J Am Stat Assoc. 2023;118(543):1760-1772. doi: 10.1080/01621459.2021.2011298. Epub 2022 Feb 3.
5
Feature selection for kernel methods in systems biology.系统生物学中核方法的特征选择
NAR Genom Bioinform. 2022 Mar 7;4(1):lqac014. doi: 10.1093/nargab/lqac014. eCollection 2022 Mar.
6
Association testing for binary trees-A Markov branching process approach.基于马尔可夫分支过程的二叉树关联检验方法
Stat Med. 2022 Jun 30;41(14):2557-2573. doi: 10.1002/sim.9370. Epub 2022 Mar 9.
7
CLARITY: comparing heterogeneous data using dissimilarity.CLARITY:使用差异度比较异构数据。
R Soc Open Sci. 2021 Dec 8;8(12):202182. doi: 10.1098/rsos.202182. eCollection 2021 Dec.
8
Information geometry for phylogenetic trees.系统发生树的信息几何。
J Math Biol. 2021 Feb 15;82(3):19. doi: 10.1007/s00285-021-01553-x.
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):304-15. doi: 10.1109/TCBB.2014.2309599.
4
kdetrees: Non-parametric estimation of phylogenetic tree distributions.KD树:系统发育树分布的非参数估计
Bioinformatics. 2014 Aug 15;30(16):2280-7. doi: 10.1093/bioinformatics/btu258. Epub 2014 Apr 24.
5
Tree-space statistics and approximations for large-scale analysis of anatomical trees.用于解剖树大规模分析的树形空间统计与近似方法
Inf Process Med Imaging. 2013;23:74-85. doi: 10.1007/978-3-642-38868-2_7.
6
One thousand two hundred ninety nuclear genes from a genome-wide survey support lungfishes as the sister group of tetrapods.从全基因组调查中获得的 1290 个核基因支持肺鱼是四足动物的姐妹群。
Mol Biol Evol. 2013 Aug;30(8):1803-7. doi: 10.1093/molbev/mst072. Epub 2013 Apr 14.
7
phangorn: phylogenetic analysis in R.phangorn:R 中的系统发育分析。
Bioinformatics. 2011 Feb 15;27(4):592-3. doi: 10.1093/bioinformatics/btq706. Epub 2010 Dec 17.
8
A fast algorithm for computing geodesic distances in tree space.一种用于计算树空间测地距离的快速算法。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):2-13. doi: 10.1109/TCBB.2010.3.
9
DendroPy: a Python library for phylogenetic computing.DendroPy:一个用于系统发育计算的 Python 库。
Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.
10
The Apicomplexan whole-genome phylogeny: an analysis of incongruence among gene trees.顶复门全基因组系统发育:基因树间不一致性分析
Mol Biol Evol. 2008 Dec;25(12):2689-98. doi: 10.1093/molbev/msn213. Epub 2008 Sep 26.