用于环境序列样本的系统发育 Kantorovich-Rubinstein 度量

The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.

作者信息

Evans Steven N, Matsen Frederick A

机构信息

University of California at Berkeley, USA.

出版信息

J R Stat Soc Series B Stat Methodol. 2012 Jun 1;74(3):569-592. doi: 10.1111/j.1467-9868.2011.01018.x. Epub 2012 Feb 15.

DOI:10.1111/j.1467-9868.2011.01018.x

PMID:22844205

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3405733/

Abstract

It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that, if we equate a metagenomic sample with its empirical distribution on a reference phylogenetic tree, then the weighted UniFrac distance between two samples is just the classical Kantorovich-Rubinstein, or earth mover's, distance between the corresponding empirical distributions. We demonstrate that this Kantorovich-Rubinstein distance and extensions incorporating uncertainty in the sample locations can be written as a readily computable integral over the tree, we develop L(p) Zolotarev-type generalizations of the metric, and we show how the p-value of the resulting natural permutation test of the null hypothesis 'no difference between two communities' can be approximated by using a Gaussian process functional. We relate the L(2)-case to an analysis-of-variance type of decomposition, finding that the distribution of its associated Gaussian functional is that of a computable linear combination of independent [Formula: see text] random variables.

摘要

现在，通过对从给定环境中批量提取的核酸材料进行测序来调查微生物群落已很常见。需要有比较方法来表明在给定此类数据集的情况下，两个群落的差异程度。UniFrac给出了两个群落之间基于系统发育的某种特别的距离，是这些分析中最常用的工具之一。我们为这类方法奠定了基础，即通过证明，如果我们将宏基因组样本与其在参考系统发育树上的经验分布等同起来，那么两个样本之间的加权UniFrac距离恰好就是相应经验分布之间的经典 Kantorovich - Rubinstein 距离，即推土机距离。我们证明了这个 Kantorovich - Rubinstein 距离以及纳入样本位置不确定性的扩展形式可以写成树上易于计算的积分，我们开发了该度量的L(p) Zolotarev型推广，并且我们展示了如何通过使用高斯过程泛函来近似原假设“两个群落无差异”的所得自然置换检验的p值。我们将L(2)情形与方差分析类型的分解相关联，发现其相关高斯泛函的分布是独立[公式：见原文]随机变量的可计算线性组合的分布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7f8/3405733/d43c4d3c2927/nihms334326f1.jpg

相似文献

The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.用于环境序列样本的系统发育 Kantorovich-Rubinstein 度量

J R Stat Soc Series B Stat Methodol. 2012 Jun 1;74(3):569-592. doi: 10.1111/j.1467-9868.2011.01018.x. Epub 2012 Feb 15.

EMDUniFrac: exact linear time computation of the UniFrac metric and identification of differentially abundant organisms.EMDUniFrac：UniFrac度量的精确线性时间计算及差异丰富生物的识别

J Math Biol. 2018 Oct;77(4):935-949. doi: 10.1007/s00285-018-1235-9. Epub 2018 Apr 25.

A Commentary on Diversity Measures UniFrac in Very Small Sample Size.关于极小样本量中多样性度量UniFrac的评论

Evol Bioinform Online. 2019 Apr 23;15:1176934319843515. doi: 10.1177/1176934319843515. eCollection 2019.

Kantorovich-Rubinstein distance and approximation for non-local Fokker-Planck equations.非局部福克-普朗克方程的康托罗维奇-鲁宾斯坦距离与逼近

Chaos. 2021 Nov;31(11):111104. doi: 10.1063/5.0065704.

Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: UniFrac.寻找宏基因组样本的系统发育感知且具有生物学意义的平均值：UniFrac方法。

bioRxiv. 2023 Feb 3:2023.02.02.526854. doi: 10.1101/2023.02.02.526854.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A linear optimal transportation framework for quantifying and visualizing variations in sets of images.一种用于量化和可视化图像集变化的线性最优传输框架。

Int J Comput Vis. 2013 Jan 1;101(2):254-269. doi: 10.1007/s11263-012-0566-z.

Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: L2UniFrac.发现具有系统发生意识和生物学意义的宏基因组样本平均值：L2UniFrac。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i57-i65. doi: 10.1093/bioinformatics/btad238.

On Markov Earth Mover's Distance.论马尔可夫推土机距离。

Int J Image Graph. 2014 Oct;14(4):1450016. doi: 10.1142/S0219467814500168.

EMBEDDING SIGNALS ON GRAPHS WITH UNBALANCED DIFFUSION EARTH MOVER'S DISTANCE.使用非平衡扩散推土机距离在图上嵌入信号

Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:5647-5651. doi: 10.1109/icassp43922.2022.9746556. Epub 2022 Apr 27.

引用本文的文献

Phylogenetic association analysis with conditional rank correlation.基于条件秩相关的系统发育关联分析。

Biometrika. 2023 Dec 1;111(3):881-902. doi: 10.1093/biomet/asad075. eCollection 2024 Sep.

Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance.比较宏基因组学中的可解释度量学习：自适应 Haar 样距离。

PLoS Comput Biol. 2024 May 20;20(5):e1011543. doi: 10.1371/journal.pcbi.1011543. eCollection 2024 May.

Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research.微生物组早产 DREAM 挑战赛：众包机器学习方法以推进早产研究。

Cell Rep Med. 2024 Jan 16;5(1):101350. doi: 10.1016/j.xcrm.2023.101350. Epub 2023 Dec 21.

MaLiAmPi enables generalizable and taxonomy-independent microbiome features from technically diverse 16S-based microbiome studies.MaLiAmPi 能够从技术上多样化的基于 16S 的微生物组研究中提取可推广且与分类无关的微生物组特征。

Cell Rep Methods. 2023 Nov 20;3(11):100639. doi: 10.1016/j.crmeth.2023.100639. Epub 2023 Nov 7.

Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: L2UniFrac.发现具有系统发生意识和生物学意义的宏基因组样本平均值：L2UniFrac。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i57-i65. doi: 10.1093/bioinformatics/btad238.

Abundance and phylogenetic distribution of eight key enzymes of the phosphorus biogeochemical cycle in grassland soils.草原土壤中磷生物地球化学循环的 8 种关键酶的丰度和系统发育分布。

Environ Microbiol Rep. 2023 Oct;15(5):352-369. doi: 10.1111/1758-2229.13159. Epub 2023 May 10.

Optimal Estimation of Wasserstein Distance on A Tree with An Application to Microbiome Studies.树上瓦瑟斯坦距离的最优估计及其在微生物组研究中的应用

J Am Stat Assoc. 2021;116(535):1237-1253. doi: 10.1080/01621459.2019.1699422. Epub 2020 Jan 23.

Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: UniFrac.寻找宏基因组样本的系统发育感知且具有生物学意义的平均值：UniFrac方法。

bioRxiv. 2023 Feb 3:2023.02.02.526854. doi: 10.1101/2023.02.02.526854.

Lagrange-NG: The next generation of Lagrange.拉格朗日-NG：拉格朗日的下一代。

Syst Biol. 2023 May 19;72(1):242-248. doi: 10.1093/sysbio/syad002.

A Statistical Perspective on the Challenges in Molecular Microbial Biology.分子微生物生物学挑战的统计学视角

J Agric Biol Environ Stat. 2021 Jun;26(2):131-160. doi: 10.1007/s13253-021-00447-1. Epub 2021 Mar 24.

本文引用的文献

Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison.边缘主成分分析和挤压聚类：利用系统发育定位数据的特殊结构进行样本比较。

PLoS One. 2013;8(3):e56859. doi: 10.1371/journal.pone.0056859. Epub 2013 Mar 11.

Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood.基于最大似然法的短序列读取进化定位的性能、准确性和网络服务器。

Syst Biol. 2011 May;60(3):291-302. doi: 10.1093/sysbio/syr010. Epub 2011 Mar 23.

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.pplacer：将序列线性时间最大似然和贝叶斯系统发生放置到固定参照树上。

BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538.

UniFrac: an effective distance metric for microbial community comparison.单因素方差分析：一种用于微生物群落比较的有效距离度量方法。

ISME J. 2011 Feb;5(2):169-72. doi: 10.1038/ismej.2010.133. Epub 2010 Sep 9.

Microbial community resemblance methods differ in their ability to detect biologically relevant patterns.微生物群落相似性方法在检测具有生物学意义的模式方面的能力存在差异。

Nat Methods. 2010 Oct;7(10):813-9. doi: 10.1038/nmeth.1499. Epub 2010 Sep 5.

Introducing W.A.T.E.R.S.: a workflow for the alignment, taxonomy, and ecology of ribosomal sequences.介绍 W.A.T.E.R.S.：一种用于核糖体序列比对、分类和生态学研究的工作流程。

BMC Bioinformatics. 2010 Jun 12;11:317. doi: 10.1186/1471-2105-11-317.

Transcriptomic analysis of a marine bacterial community enriched with dimethylsulfoniopropionate.富含二甲基硫代丙酸的海洋细菌群落的转录组分析。

ISME J. 2010 Nov;4(11):1410-20. doi: 10.1038/ismej.2010.62. Epub 2010 May 13.

Metagenomic sequencing of an in vitro-simulated microbial community.微生物群落体外模拟的宏基因组测序。

PLoS One. 2010 Apr 16;5(4):e10209. doi: 10.1371/journal.pone.0010209.

QIIME allows analysis of high-throughput community sequencing data.QIIME可用于分析高通量群落测序数据。

Nat Methods. 2010 May;7(5):335-6. doi: 10.1038/nmeth.f.303. Epub 2010 Apr 11.

Alignment and clustering of phylogenetic markers--implications for microbial diversity studies.系统发育标记的聚类与对齐——对微生物多样性研究的启示。

BMC Bioinformatics. 2010 Mar 24;11:152. doi: 10.1186/1471-2105-11-152.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于环境序列样本的系统发育 Kantorovich-Rubinstein 度量

The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献