Suppr超能文献

树上瓦瑟斯坦距离的最优估计及其在微生物组研究中的应用

Optimal Estimation of Wasserstein Distance on A Tree with An Application to Microbiome Studies.

作者信息

Wang Shulei, Cai T Tony, Li Hongzhe

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.

Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

出版信息

J Am Stat Assoc. 2021;116(535):1237-1253. doi: 10.1080/01621459.2019.1699422. Epub 2020 Jan 23.

Abstract

The weighted UniFrac distance, a plug-in estimator of the Wasserstein distance of read counts on a tree, has been widely used to measure the microbial community difference in microbiome studies. Our investigation however shows that such a plug-in estimator, although intuitive and commonly used in practice, suffers from potential bias. Motivated by this finding, we study the problem of optimal estimation of the Wasserstein distance between two distributions on a tree from the sampled data in the high-dimensional setting. The minimax rate of convergence is established. To overcome the bias problem, we introduce a new estimator, referred to as the moment-screening estimator on a tree (MET), by using implicit best polynomial approximation that incorporates the tree structure. The new estimator is computationally efficient and is shown to be minimax rate-optimal. Numerical studies using both simulated and real biological datasets demonstrate the practical merits of MET, including reduced biases and statistically more significant differences in microbiome between the inactive Crohn's disease patients and the normal controls.

摘要

加权UniFrac距离是树上读取计数的Wasserstein距离的一种插件估计器,已被广泛用于测量微生物组研究中的微生物群落差异。然而,我们的研究表明,这种插件估计器虽然直观且在实践中常用,但存在潜在偏差。受这一发现的启发,我们研究了在高维设置下从采样数据中对树上两个分布之间的Wasserstein距离进行最优估计的问题。建立了极小极大收敛速率。为了克服偏差问题,我们引入了一种新的估计器,称为树上矩筛选估计器(MET),它使用了结合树结构的隐式最佳多项式逼近。新估计器计算效率高,并且被证明是极小极大速率最优的。使用模拟和真实生物数据集的数值研究证明了MET的实际优点,包括减少偏差以及在非活动型克罗恩病患者和正常对照之间微生物组的统计学上更显著的差异。

相似文献

2
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.
5
Aggregated Wasserstein Distance and State Registration for Hidden Markov Models.隐马尔可夫模型的聚合瓦瑟斯坦距离与状态配准
IEEE Trans Pattern Anal Mach Intell. 2020 Sep;42(9):2133-2147. doi: 10.1109/TPAMI.2019.2908635. Epub 2019 Apr 1.
7
On point estimation of the abnormality of a Mahalanobis index.关于马氏指数异常的点估计。
Comput Stat Data Anal. 2016 Jul;99:115-130. doi: 10.1016/j.csda.2016.01.014.

引用本文的文献

1
Phylogenetic association analysis with conditional rank correlation.基于条件秩相关的系统发育关联分析。
Biometrika. 2023 Dec 1;111(3):881-902. doi: 10.1093/biomet/asad075. eCollection 2024 Sep.
2
Analysis of Microbiome Data.微生物组数据分析
Annu Rev Stat Appl. 2024 Apr;11(1):483-504. doi: 10.1146/annurev-statistics-040522-120734. Epub 2023 Oct 13.

本文引用的文献

1
Quantitative assessment of cell population diversity in single-cell landscapes.单细胞景观中细胞群体多样性的定量评估。
PLoS Biol. 2018 Oct 22;16(10):e2006687. doi: 10.1371/journal.pbio.2006687. eCollection 2018 Oct.
4
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.
6
Principal Graph and Structure Learning Based on Reversed Graph Embedding.基于反向图嵌入的主图和结构学习。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2227-2241. doi: 10.1109/TPAMI.2016.2635657. Epub 2016 Dec 5.
7
Expanding the UniFrac Toolbox.扩展UniFrac工具包。
PLoS One. 2016 Sep 15;11(9):e0161196. doi: 10.1371/journal.pone.0161196. eCollection 2016.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验