Suppr超能文献

整合系统发育信息的微生物组数据的转化和差异丰度分析。

Transformation and differential abundance analysis of microbiome data incorporating phylogeny.

机构信息

Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240 Shanghai, China.

SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 200240 Shanghai, China.

出版信息

Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.

Abstract

MOTIVATION

Microbiome data have proven extremely useful for understanding microbial communities and their impacts in health and disease. Although microbiome analysis methods and standards are evolving rapidly, obtaining meaningful and interpretable results from microbiome studies still requires careful statistical treatment. In particular, many existing and emerging methods for differential abundance (DA) analysis fail to account for the fact that microbiome data are high-dimensional and sparse, compositional, negatively and positively correlated and phylogenetically structured. To better describe microbiome data and improve the power of DA testing, there is still a great need for the continued development of appropriate statistical methodology.

RESULTS

In this article, we propose a model-based approach for microbiome data transformation, and a phylogenetically informed procedure for DA testing based on the transformed data. First, we extend the Dirichlet-tree multinomial (DTM) to zero-inflated DTM for multivariate modeling of microbial counts, addressing data sparsity and correlation and phylogeny among bacterial taxa. Then, within this framework and using a Bayesian formulation, we introduce posterior mean transformation to convert raw counts into non-zero relative abundances that sum to one, accounting for the compositionality nature of microbiome data. Second, using the transformed data, we propose adaptive analysis of composition of microbiomes (adaANCOM) for DA testing by constructing log-ratios adaptively on the tree for each taxon, greatly reducing the computational complexity of ANCOM in high dimensions. Finally, we present extensive simulation studies, an analysis of HMP data across 18 body sites and 2 visits, and an application to a gut microbiome and malnutrition study, to investigate the performance of posterior mean transformation and adaANCOM. Comparisons with ANCOM and other DA testing procedures show that adaANCOM controls the false discovery rate well, allows for easy interpretation of the results, and is computationally efficient for high-dimensional problems.

AVAILABILITY AND IMPLEMENTATION

The developed R package is available at https://github.com/ZRChao/adaANCOM. For replicability purposes, scripts for our simulations and data analysis are available at https://github.com/ZRChao/Papers_supplementary.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

微生物组数据已被证明在理解微生物群落及其在健康和疾病中的影响方面非常有用。尽管微生物组分析方法和标准正在迅速发展,但要从微生物组研究中获得有意义和可解释的结果,仍然需要仔细的统计处理。特别是,许多现有的和新兴的差异丰度(DA)分析方法未能考虑到微生物组数据是高维的、稀疏的、组成的、负相关的和系统发育结构的事实。为了更好地描述微生物组数据并提高 DA 测试的功效,仍然需要不断开发适当的统计方法。

结果

在本文中,我们提出了一种基于模型的微生物组数据转换方法,以及一种基于转换后数据的系统发育信息 DA 测试方法。首先,我们将 Dirichlet-tree 多项分布(DTM)扩展到零膨胀 DTM,以对微生物计数进行多元建模,解决了数据稀疏性以及细菌分类单元之间的相关性和系统发育问题。然后,在这个框架内,并使用贝叶斯公式,我们引入了后验均值转换,将原始计数转换为非零的相对丰度,这些丰度之和为一,考虑到微生物组数据的组成性质。其次,使用转换后的数据,我们通过为每个分类单元在树上自适应地构建对数比,提出了微生物组组成的自适应分析(adaANCOM)来进行 DA 测试,大大降低了 ANCOM 在高维中的计算复杂度。最后,我们进行了广泛的模拟研究、对来自 18 个身体部位和 2 次访问的 HMP 数据进行了分析,并将其应用于肠道微生物组和营养不良研究,以研究后验均值转换和 adaANCOM 的性能。与 ANCOM 和其他 DA 测试程序的比较表明,adaANCOM 能很好地控制假发现率,使结果易于解释,并且对于高维问题计算效率高。

可用性和实施

开发的 R 包可在 https://github.com/ZRChao/adaANCOM 上获得。为了可复制性目的,我们的模拟和数据分析脚本可在 https://github.com/ZRChao/Papers_supplementary 上获得。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验