Yu Jaehong, Hosain Md Mozaffar, Park Taesung
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea.
Department of Statistics, Seoul National University, Seoul, 08826, Republic of Korea.
Genes Genomics. 2025 Sep 16. doi: 10.1007/s13258-025-01673-4.
Identifying microbiome markers associated with ordered phenotypes, such as disease stages or severity levels, is crucial for understanding disease progression and advancing precision medicine. Despite this importance, most existing methods for differential abundance analysis are designed for binary group comparisons and do not incorporate ordinal information, limiting their ability to capture trends across ordered categories.
To develop and evaluate statistical methods that explicitly account for ordinal phenotype structure in microbiome data, addressing challenges such as sparsity and zero inflation, and improving the detection of meaningful microbial associations.
In this study, we propose and evaluate three novel approaches specifically tailored for microbiome association analysis with ordered groups: the binary optimal test, the linear trend test, and the proportional odds model-based permutation test (POMp). These methods explicitly account for the ordinal structure of phenotypes and address the sparsity and zero-inflation commonly observed in microbiome data through permutation-based inference. We applied the proposed methods to three publicly available gut microbiome datasets, including two related to obesity and one concerning colorectal cancer.
All three proposed methods successfully identified differentially abundant features (DAFs) that exhibited stronger ordinal associations compared to those identified by existing methods. In particular, POMp consistently outperformed other approaches in terms of correlation with phenotype order, demonstrating its potential to identify biologically relevant markers.
The findings of this study highlight the importance of incorporating ordinal information in microbiome studies and provide robust statistical tools for advancing microbial biomarker discovery in complex disease contexts.
识别与有序表型(如疾病阶段或严重程度)相关的微生物组标记,对于理解疾病进展和推进精准医学至关重要。尽管具有重要性,但大多数现有的差异丰度分析方法是为二元组比较设计的,未纳入有序信息,限制了它们捕捉有序类别间趋势的能力。
开发并评估能明确考虑微生物组数据中有序表型结构的统计方法,解决稀疏性和零膨胀等挑战,并改善有意义的微生物关联的检测。
在本研究中,我们提出并评估了三种专门为有序组微生物组关联分析量身定制的新方法:二元最优检验、线性趋势检验和基于比例优势模型的置换检验(POMp)。这些方法明确考虑了表型的有序结构,并通过基于置换的推断解决微生物组数据中常见的稀疏性和零膨胀问题。我们将所提出的方法应用于三个公开可用的肠道微生物组数据集,包括两个与肥胖相关的数据集和一个与结直肠癌相关的数据集。
与现有方法识别出的特征相比,所有三种提出的方法都成功识别出了具有更强有序关联的差异丰度特征(DAF)。特别是,在与表型顺序的相关性方面,POMp始终优于其他方法,证明了其识别生物学相关标记的潜力。
本研究结果突出了在微生物组研究中纳入有序信息的重要性,并为在复杂疾病背景下推进微生物生物标志物发现提供了强大的统计工具。