Taxanorm：一种用于微生物组数据的新型分类群特异性标准化方法。

Taxanorm: a novel taxa-specific normalization approach for microbiome data.

作者信息

Wang Ziyue, Lloyd Dillon, Zhao Shanshan, Motsinger-Reif Alison

机构信息

Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, 27709, USA.

Department of Population Health, NYU Grossman School of Medicine, New York, NY, 10016, USA.

出版信息

BMC Bioinformatics. 2024 Sep 16;25(1):304. doi: 10.1186/s12859-024-05918-z.

DOI:10.1186/s12859-024-05918-z

PMID:39285319

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11406911/

Abstract

BACKGROUND

In high-throughput sequencing studies, sequencing depth, which quantifies the total number of reads, varies across samples. Unequal sequencing depth can obscure true biological signals of interest and prevent direct comparisons between samples. To remove variability due to differential sequencing depth, taxa counts are usually normalized before downstream analysis. However, most existing normalization methods scale counts using size factors that are sample specific but not taxa specific, which can result in over- or under-correction for some taxa.

RESULTS

We developed TaxaNorm, a novel normalization method based on a zero-inflated negative binomial model. This method assumes the effects of sequencing depth on mean and dispersion vary across taxa. Incorporating the zero-inflation part can better capture the nature of microbiome data. We also propose two corresponding diagnosis tests on the varying sequencing depth effect for validation. We find that TaxaNorm achieves comparable performance to existing methods in most simulation scenarios in downstream analysis and reaches a higher power for some cases. Specifically, it balances power and false discovery control well. When applying the method in a real dataset, TaxaNorm has improved performance when correcting technical bias.

CONCLUSION

TaxaNorm both sample- and taxon- specific bias by introducing an appropriate regression framework in the microbiome data, which aids in data interpretation and visualization. The 'TaxaNorm' R package is freely available through the CRAN repository https://CRAN.R-project.org/package=TaxaNorm and the source code can be downloaded at https://github.com/wangziyue57/TaxaNorm .

摘要

背景

在高通量测序研究中，测序深度（用于量化读取的总数）在不同样本间存在差异。测序深度不均会掩盖感兴趣的真实生物学信号，并阻碍样本间的直接比较。为消除因测序深度差异导致的变异性，通常在下游分析前对分类单元计数进行标准化。然而，大多数现有的标准化方法使用特定于样本而非特定于分类单元的大小因子来缩放计数，这可能导致对某些分类单元的过度校正或校正不足。

结果

我们开发了TaxaNorm，一种基于零膨胀负二项式模型的新型标准化方法。该方法假设测序深度对均值和离散度的影响因分类单元而异。纳入零膨胀部分可以更好地捕捉微生物组数据的本质。我们还提出了两种相应的诊断测试，用于验证测序深度效应的变化。我们发现，在下游分析的大多数模拟场景中，TaxaNorm与现有方法具有可比的性能，并且在某些情况下具有更高的功效。具体而言，它在功效和错误发现控制之间取得了良好的平衡。在实际数据集中应用该方法时，TaxaNorm在纠正技术偏差方面具有更好的性能。

结论

TaxaNorm通过在微生物组数据中引入适当的回归框架，消除了样本和分类单元特异性偏差，这有助于数据解释和可视化。“TaxaNorm”R包可通过CRAN存储库https://CRAN.R-project.org/package=TaxaNorm免费获得，源代码可在https://github.com/wangziyue57/TaxaNorm下载。