微生物组样本差异丰度分析中的分组归一化

Group-wise normalization in differential abundance analysis of microbiome samples.

作者信息

Clark-Boucher Dylan, Coull Brent, Reeder Harrison T, Wang Fenglei, Sun Qi, Starr Jacqueline R, Lee Kyu Ha

机构信息

Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, United States.

Biostatistics, Massachusetts General Hospital, Boston, MA, United States.

出版信息

ArXiv. 2024 Nov 23:arXiv:2411.15400v1.

PMID:39936028

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11812596/

Abstract

MOTIVATION

A key challenge in differential abundance analysis of microbial samples is that the counts for each sample are compositional, resulting in biased comparisons of the absolute abundance across study groups. Normalization-based differential abundance analysis methods rely on external normalization factors that account for the compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled at maintaining the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in differential abundance analysis by re-conceptualizing normalization as a group-level task. We present two normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS).

RESULTS

G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings, while maintaining the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the differential abundance analysis method MetagenomeSeq.

AVAILABILITY AND IMPLEMENTATION

Code for implementing the methods and replicating the analysis can be found at our GitHub page.

摘要

动机

微生物样本差异丰度分析中的一个关键挑战是，每个样本的计数具有组成性，这导致跨研究组的绝对丰度比较存在偏差。基于归一化的差异丰度分析方法依赖于外部归一化因子，通过将计数标准化到一个共同的数值尺度来考虑组成性。然而，在方差或组成偏差较大的情况下，现有的归一化方法在控制错误发现率方面存在困难。本文提出了一种新的归一化框架，通过将归一化重新概念化为一个组水平的任务，可以减少差异丰度分析中的偏差。我们在组水平框架内提出了两种归一化方法：组水平相对对数表达（G-RLE）和倍数截断和缩放（FTSS）。