Li Shuai, Li Runzhe, Lee John R, Zhao Ni, Ling Wodan
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States.
Division of Nephrology and Hypertension, Department of Medicine, Weill Medical College of Cornell University, New York, NY, United States.
Front Genet. 2025 Jan 29;15:1494401. doi: 10.3389/fgene.2024.1494401. eCollection 2024.
Identifying bacterial taxa associated with disease phenotypes or clinical treatments over time is critical for understanding the underlying biological mechanism. Association testing for microbiome data is already challenging due to its complex distribution that involves sparsity, over-dispersion, heavy tails, etc. The longitudinal nature of the data adds another layer of complexity - one needs to account for the within-subject correlations to avoid biased results. Existing longitudinal differential abundance approaches usually depend on strong parametric assumptions, such as zero-inflated normal or negative binomial. However, the complex microbiome data frequently violate these distributional assumptions, leading to inflated false discovery rates. In addition, the existing methods are mostly mean-based, unable to identify heterogeneous associations such as tail events or subgroup effects, which could be important biomedical signals.
We propose a zero-inflated quantile approach for longitudinal (ZINQ-L) microbiome differential abundance test. A mixed-effects quantile rank-score-based test was proposed for hypothesis testing, which consists of a test in mixed-effects logistic model for the presence-absence status of the investigated taxon, and a series of mixed-effects quantile rank-score tests adjusted for zero inflation given its presence. As a regression method with minimal distributional assumptions, it is robust to the complex microbiome data, controlling false discovery rate, and is flexible to adjust for important covariates. Its comprehensive examination of the abundance distribution enables the identification of heterogeneous associations, improving the testing power.
Extensive simulation studies and an application to a real kidney transplant microbiome study demonstrate the improved power of ZINQ-L in detecting true signals while controlling false discovery rates.
ZINQ-L is a zero-inflated quantile-based approach for detecting individual taxa associated with outcomes or exposures in longitudinal microbiome studies, providing a robust and powerful option to improve and complement the existing methods in the field.
随着时间的推移识别与疾病表型或临床治疗相关的细菌分类群对于理解潜在的生物学机制至关重要。由于微生物组数据的复杂分布,包括稀疏性、过度离散、重尾等,对其进行关联测试已经具有挑战性。数据的纵向性质又增加了一层复杂性——需要考虑个体内部的相关性以避免结果出现偏差。现有的纵向差异丰度方法通常依赖于强参数假设,例如零膨胀正态或负二项分布。然而,复杂的微生物组数据经常违反这些分布假设,导致错误发现率虚高。此外,现有方法大多基于均值,无法识别诸如尾部事件或亚组效应等异质性关联,而这些可能是重要的生物医学信号。
我们提出了一种用于纵向(ZINQ-L)微生物组差异丰度测试的零膨胀分位数方法。提出了一种基于混合效应分位数秩得分的检验用于假设检验,它包括在混合效应逻辑模型中对所研究分类群的存在与否状态进行检验,以及在其存在的情况下针对零膨胀进行调整的一系列混合效应分位数秩得分检验。作为一种分布假设最少的回归方法,它对复杂的微生物组数据具有鲁棒性,能控制错误发现率,并且能灵活地调整重要的协变量。对丰度分布的全面检验能够识别异质性关联,提高检验效能。
广泛的模拟研究以及在一项真实的肾移植微生物组研究中的应用表明,ZINQ-L在控制错误发现率的同时检测真实信号的能力有所提高。
ZINQ-L是一种基于零膨胀分位数的方法,用于在纵向微生物组研究中检测与结局或暴露相关的个体分类群,为改进和补充该领域现有方法提供了一种鲁棒且强大的选择。