Suppr超能文献

用于环境序列样本的系统发育 Kantorovich-Rubinstein 度量

The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.

作者信息

Evans Steven N, Matsen Frederick A

机构信息

University of California at Berkeley, USA.

出版信息

J R Stat Soc Series B Stat Methodol. 2012 Jun 1;74(3):569-592. doi: 10.1111/j.1467-9868.2011.01018.x. Epub 2012 Feb 15.

Abstract

It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that, if we equate a metagenomic sample with its empirical distribution on a reference phylogenetic tree, then the weighted UniFrac distance between two samples is just the classical Kantorovich-Rubinstein, or earth mover's, distance between the corresponding empirical distributions. We demonstrate that this Kantorovich-Rubinstein distance and extensions incorporating uncertainty in the sample locations can be written as a readily computable integral over the tree, we develop L(p) Zolotarev-type generalizations of the metric, and we show how the p-value of the resulting natural permutation test of the null hypothesis 'no difference between two communities' can be approximated by using a Gaussian process functional. We relate the L(2)-case to an analysis-of-variance type of decomposition, finding that the distribution of its associated Gaussian functional is that of a computable linear combination of independent [Formula: see text] random variables.

摘要

现在,通过对从给定环境中批量提取的核酸材料进行测序来调查微生物群落已很常见。需要有比较方法来表明在给定此类数据集的情况下,两个群落的差异程度。UniFrac给出了两个群落之间基于系统发育的某种特别的距离,是这些分析中最常用的工具之一。我们为这类方法奠定了基础,即通过证明,如果我们将宏基因组样本与其在参考系统发育树上的经验分布等同起来,那么两个样本之间的加权UniFrac距离恰好就是相应经验分布之间的经典 Kantorovich - Rubinstein 距离,即推土机距离。我们证明了这个 Kantorovich - Rubinstein 距离以及纳入样本位置不确定性的扩展形式可以写成树上易于计算的积分,我们开发了该度量的L(p) Zolotarev型推广,并且我们展示了如何通过使用高斯过程泛函来近似原假设“两个群落无差异”的所得自然置换检验的p值。我们将L(2)情形与方差分析类型的分解相关联,发现其相关高斯泛函的分布是独立[公式:见原文]随机变量的可计算线性组合的分布。

相似文献

1
The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.用于环境序列样本的系统发育 Kantorovich-Rubinstein 度量
J R Stat Soc Series B Stat Methodol. 2012 Jun 1;74(3):569-592. doi: 10.1111/j.1467-9868.2011.01018.x. Epub 2012 Feb 15.
3
A Commentary on Diversity Measures UniFrac in Very Small Sample Size.关于极小样本量中多样性度量UniFrac的评论
Evol Bioinform Online. 2019 Apr 23;15:1176934319843515. doi: 10.1177/1176934319843515. eCollection 2019.
9
On Markov Earth Mover's Distance.论马尔可夫推土机距离。
Int J Image Graph. 2014 Oct;14(4):1450016. doi: 10.1142/S0219467814500168.
10
EMBEDDING SIGNALS ON GRAPHS WITH UNBALANCED DIFFUSION EARTH MOVER'S DISTANCE.使用非平衡扩散推土机距离在图上嵌入信号
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:5647-5651. doi: 10.1109/icassp43922.2022.9746556. Epub 2022 Apr 27.

引用本文的文献

1
Phylogenetic association analysis with conditional rank correlation.基于条件秩相关的系统发育关联分析。
Biometrika. 2023 Dec 1;111(3):881-902. doi: 10.1093/biomet/asad075. eCollection 2024 Sep.
9
Lagrange-NG: The next generation of Lagrange.拉格朗日-NG:拉格朗日的下一代。
Syst Biol. 2023 May 19;72(1):242-248. doi: 10.1093/sysbio/syad002.
10
A Statistical Perspective on the Challenges in Molecular Microbial Biology.分子微生物生物学挑战的统计学视角
J Agric Biol Environ Stat. 2021 Jun;26(2):131-160. doi: 10.1007/s13253-021-00447-1. Epub 2021 Mar 24.

本文引用的文献

9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验