Suppr超能文献

用于评估系统发育树和层次聚类树的计算工具

Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees.

作者信息

Chakerian John, Holmes Susan

机构信息

Palantir Technologies.

Stanford University, Stanford, CA 94305.

出版信息

J Comput Graph Stat. 2012;21(3):581-599. doi: 10.1080/10618600.2012.640901. Epub 2012 Aug 16.

Abstract

Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960s. In bioinformatics, psychometrics, and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and "generalizability" of these summaries. This article describes the implementation of the geometric distance between trees developed by Billera, Holmes, and Vogtmann (2001) equally applicable to phylogenetic trees and hierarchical clustering trees, and shows some of the applications in evaluating tree estimates. In particular, since Billera et al. (2001) have shown that the space of trees is negatively curved (called a CAT(0) space), a collection of trees can naturally be represented as a tree. We compare this representation to the Euclidean approximations of treespace made available through both a classical multidimensional scaling and a Kernel multidimensional scaling of the matrix of the distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence of both certain columns (positions, variables, or genes) and certain rows (species, observations, or arrays) on the construction of such trees. It also can provide a way of detecting heterogeneous mixtures in the input data. Supplementary materials for this article are available online.

摘要

在进化生物学领域,自20世纪60年代以来,系统发育树已根据DNA数据构建完成,对树估计值进行的推断性总结很有用。在生物信息学、心理测量学和数据挖掘中,层次聚类技术输出的是相同的数学对象,从业者对这些总结的稳定性和“可推广性”也有类似的问题。本文描述了由比勒拉、霍姆斯和沃格特曼(2001年)开发的树之间几何距离的实现方法,该方法同样适用于系统发育树和层次聚类树,并展示了其在评估树估计值方面的一些应用。特别是,由于比勒拉等人(2001年)已经表明树空间是负曲率的(称为CAT(0)空间),一组树可以自然地表示为一棵树。我们将这种表示与通过经典多维缩放和树之间距离矩阵的核多维缩放得到的树空间的欧几里得近似进行比较。我们还提供了树之间距离在由微阵列构建的层次聚类树上的应用。我们的方法提供了一种新的方式来评估某些列(位置、变量或基因)和某些行(物种、观测值或阵列)对这类树构建的影响。它还可以提供一种检测输入数据中异质混合物的方法。本文的补充材料可在线获取。

相似文献

3
Tropical Density Estimation of Phylogenetic Trees.系统发育树的热带密度估计
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1855-1863. doi: 10.1109/TCBB.2024.3420815. Epub 2024 Dec 10.
4
Geodesics to characterize the phylogenetic landscape.测地线刻画系统发育景观。
PLoS One. 2023 Jun 23;18(6):e0287350. doi: 10.1371/journal.pone.0287350. eCollection 2023.
5
A fast algorithm for computing geodesic distances in tree space.一种用于计算树空间测地距离的快速算法。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):2-13. doi: 10.1109/TCBB.2010.3.
6
Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace.规范化比尔勒-霍姆斯-沃格特曼树空间中的核。
IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1359-1365. doi: 10.1109/TCBB.2016.2565475. Epub 2016 May 10.
7
Robust Analysis of Phylogenetic Tree Space.系统发育树空间的稳健分析。
Syst Biol. 2022 Aug 10;71(5):1255-1270. doi: 10.1093/sysbio/syab100.
8
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

引用本文的文献

3
Statistical summaries of unlabelled evolutionary trees.未标记进化树的统计摘要。
Biometrika. 2023 Apr 26;111(1):171-193. doi: 10.1093/biomet/asad025. eCollection 2024 Mar.
5
Testing for genetic mutation of seasonal influenza virus.季节性流感病毒基因突变检测。
J Appl Stat. 2021 Sep 29;50(1):1-18. doi: 10.1080/02664763.2021.1978955. eCollection 2023.
6
Robust Analysis of Phylogenetic Tree Space.系统发育树空间的稳健分析。
Syst Biol. 2022 Aug 10;71(5):1255-1270. doi: 10.1093/sysbio/syab100.
10
Distance metrics for ranked evolutionary trees.排序进化树的距离度量。
Proc Natl Acad Sci U S A. 2020 Nov 17;117(46):28876-28886. doi: 10.1073/pnas.1922851117. Epub 2020 Nov 2.

本文引用的文献

1
phangorn: phylogenetic analysis in R.phangorn:R 中的系统发育分析。
Bioinformatics. 2011 Feb 15;27(4):592-3. doi: 10.1093/bioinformatics/btq706. Epub 2010 Dec 17.
2
A fast algorithm for computing geodesic distances in tree space.一种用于计算树空间测地距离的快速算法。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):2-13. doi: 10.1109/TCBB.2010.3.
7
Statistics for phylogenetic trees.系统发育树的统计学
Theor Popul Biol. 2003 Feb;63(1):17-32. doi: 10.1016/s0040-5809(02)00005-9.
9
Statistically based postprocessing of phylogenetic analysis by clustering.基于聚类的系统发育分析的统计后处理
Bioinformatics. 2002;18 Suppl 1:S285-93. doi: 10.1093/bioinformatics/18.suppl_1.s285.
10
MRBAYES: Bayesian inference of phylogenetic trees.MRBAYES:系统发育树的贝叶斯推断
Bioinformatics. 2001 Aug;17(8):754-5. doi: 10.1093/bioinformatics/17.8.754.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验