• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

树上瓦瑟斯坦距离的最优估计及其在微生物组研究中的应用

Optimal Estimation of Wasserstein Distance on A Tree with An Application to Microbiome Studies.

作者信息

Wang Shulei, Cai T Tony, Li Hongzhe

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.

Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

出版信息

J Am Stat Assoc. 2021;116(535):1237-1253. doi: 10.1080/01621459.2019.1699422. Epub 2020 Jan 23.

DOI:10.1080/01621459.2019.1699422
PMID:36860698
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9974173/
Abstract

The weighted UniFrac distance, a plug-in estimator of the Wasserstein distance of read counts on a tree, has been widely used to measure the microbial community difference in microbiome studies. Our investigation however shows that such a plug-in estimator, although intuitive and commonly used in practice, suffers from potential bias. Motivated by this finding, we study the problem of optimal estimation of the Wasserstein distance between two distributions on a tree from the sampled data in the high-dimensional setting. The minimax rate of convergence is established. To overcome the bias problem, we introduce a new estimator, referred to as the moment-screening estimator on a tree (MET), by using implicit best polynomial approximation that incorporates the tree structure. The new estimator is computationally efficient and is shown to be minimax rate-optimal. Numerical studies using both simulated and real biological datasets demonstrate the practical merits of MET, including reduced biases and statistically more significant differences in microbiome between the inactive Crohn's disease patients and the normal controls.

摘要

加权UniFrac距离是树上读取计数的Wasserstein距离的一种插件估计器,已被广泛用于测量微生物组研究中的微生物群落差异。然而,我们的研究表明,这种插件估计器虽然直观且在实践中常用,但存在潜在偏差。受这一发现的启发,我们研究了在高维设置下从采样数据中对树上两个分布之间的Wasserstein距离进行最优估计的问题。建立了极小极大收敛速率。为了克服偏差问题,我们引入了一种新的估计器,称为树上矩筛选估计器(MET),它使用了结合树结构的隐式最佳多项式逼近。新估计器计算效率高,并且被证明是极小极大速率最优的。使用模拟和真实生物数据集的数值研究证明了MET的实际优点,包括减少偏差以及在非活动型克罗恩病患者和正常对照之间微生物组的统计学上更显著的差异。

相似文献

1
Optimal Estimation of Wasserstein Distance on A Tree with An Application to Microbiome Studies.树上瓦瑟斯坦距离的最优估计及其在微生物组研究中的应用
J Am Stat Assoc. 2021;116(535):1237-1253. doi: 10.1080/01621459.2019.1699422. Epub 2020 Jan 23.
2
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.
3
Minimax Rate-optimal Estimation of KL Divergence between Discrete Distributions.离散分布之间KL散度的极小极大速率最优估计。
Int Symp Inf Theory Appl. 2016;2016:256-260.
4
Entropy-Regularized Optimal Transport on Multivariate Normal and -normal Distributions.多元正态分布和次正态分布上的熵正则化最优传输
Entropy (Basel). 2021 Mar 3;23(3):302. doi: 10.3390/e23030302.
5
Aggregated Wasserstein Distance and State Registration for Hidden Markov Models.隐马尔可夫模型的聚合瓦瑟斯坦距离与状态配准
IEEE Trans Pattern Anal Mach Intell. 2020 Sep;42(9):2133-2147. doi: 10.1109/TPAMI.2019.2908635. Epub 2019 Apr 1.
6
APPROXIMATION AND ESTIMATION OF -CONCAVE DENSITIES VIA RÉNYI DIVERGENCES.通过雷尼散度对 -凹密度进行近似和估计。
Ann Stat. 2016;44(3):1332-1359. doi: 10.1214/15-AOS1408. Epub 2016 Apr 11.
7
On point estimation of the abnormality of a Mahalanobis index.关于马氏指数异常的点估计。
Comput Stat Data Anal. 2016 Jul;99:115-130. doi: 10.1016/j.csda.2016.01.014.
8
Hypothesis testing for phylogenetic composition: a minimum-cost flow perspective.系统发育组成的假设检验:最小费用流视角
Biometrika. 2020 Jul 11;108(1):17-36. doi: 10.1093/biomet/asaa061. eCollection 2021 Mar.
9
Hypothesis Test and Confidence Analysis With Wasserstein Distance on General Dimension.一般维的 Wasserstein 距离的假设检验和置信分析。
Neural Comput. 2022 May 19;34(6):1448-1487. doi: 10.1162/neco_a_01501.
10
Scalable Gromov-Wasserstein Based Comparison of Biological Time Series.基于可扩展的 Gromov-Wasserstein 的生物时间序列比较。
Bull Math Biol. 2023 Jul 7;85(8):77. doi: 10.1007/s11538-023-01175-y.

引用本文的文献

1
Phylogenetic association analysis with conditional rank correlation.基于条件秩相关的系统发育关联分析。
Biometrika. 2023 Dec 1;111(3):881-902. doi: 10.1093/biomet/asad075. eCollection 2024 Sep.
2
Analysis of Microbiome Data.微生物组数据分析
Annu Rev Stat Appl. 2024 Apr;11(1):483-504. doi: 10.1146/annurev-statistics-040522-120734. Epub 2023 Oct 13.
3
Compositional Data Analysis using Kernels in mass cytometry data.在质谱流式细胞术数据中使用核函数进行成分数据分析。
Bioinform Adv. 2022 Feb 11;2(1):vbac003. doi: 10.1093/bioadv/vbac003. eCollection 2022.

本文引用的文献

1
Quantitative assessment of cell population diversity in single-cell landscapes.单细胞景观中细胞群体多样性的定量评估。
PLoS Biol. 2018 Oct 22;16(10):e2006687. doi: 10.1371/journal.pbio.2006687. eCollection 2018 Oct.
2
Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information.精确扩增子序列的系统发育定位改善了与临床信息的关联。
mSystems. 2018 Apr 17;3(3). doi: 10.1128/mSystems.00021-18. eCollection 2018 May-Jun.
3
Minimax Rate-optimal Estimation of KL Divergence between Discrete Distributions.离散分布之间KL散度的极小极大速率最优估计。
Int Symp Inf Theory Appl. 2016;2016:256-260.
4
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.
5
Normalization and microbial differential abundance strategies depend upon data characteristics.归一化和微生物差异丰度策略取决于数据特征。
Microbiome. 2017 Mar 3;5(1):27. doi: 10.1186/s40168-017-0237-y.
6
Principal Graph and Structure Learning Based on Reversed Graph Embedding.基于反向图嵌入的主图和结构学习。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2227-2241. doi: 10.1109/TPAMI.2016.2635657. Epub 2016 Dec 5.
7
Expanding the UniFrac Toolbox.扩展UniFrac工具包。
PLoS One. 2016 Sep 15;11(9):e0161196. doi: 10.1371/journal.pone.0161196. eCollection 2016.
8
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses.微生物组数据分析的Bioconductor工作流程:从原始读段到群落分析
F1000Res. 2016 Jun 24;5:1492. doi: 10.12688/f1000research.8986.2. eCollection 2016.
9
Visualization and cellular hierarchy inference of single-cell data using SPADE.使用 SPADE 可视化和推断单细胞数据的细胞层次结构。
Nat Protoc. 2016 Jul;11(7):1264-79. doi: 10.1038/nprot.2016.066. Epub 2016 Jun 16.
10
Earth Mover's Distance (EMD): A True Metric for Comparing Biomarker Expression Levels in Cell Populations.推土机距离(EMD):一种用于比较细胞群体中生物标志物表达水平的真正度量标准。
PLoS One. 2016 Mar 23;11(3):e0151859. doi: 10.1371/journal.pone.0151859. eCollection 2016.