统一高通量测序数据集的分析：通过组合数据分析描述 RNA-seq、16S rRNA 基因测序和选择性生长实验。

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.

机构信息

, YouKaryote Genomics, London, ON, Canada.

Department of Biochemistry, Medical Science Building, University of Western Ontario, 1151 Richmond St, N6A 5C1, London, ON, Canada.

出版信息

Microbiome. 2014 May 5;2:15. doi: 10.1186/2049-2618-2-15. eCollection 2014.

DOI:10.1186/2049-2618-2-15

PMID:24910773

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4030730/

Abstract

BACKGROUND

Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible.

RESULTS

Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples.

CONCLUSION

Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a similar approach.

摘要

背景

利用高通量测序生成数据集的实验设计包括 RNA 测序（RNA-seq）、染色质免疫沉淀测序（ChIP-seq）、16S rRNA 基因片段测序、宏基因组分析和选择性生长实验。在每种情况下，基础数据都是相似的，由测序读取计数组成，这些计数映射到每个样本中的大量特征上。尽管存在这种基础相似性，但用于这些实验设计的数据分析方法都不同，并且不能跨实验转换。在物理和地质科学中已经开发了替代方法，将类似的数据视为组成。组成数据分析方法将数据转换为相对丰度，从而使分析更稳健且可重复。

结果

使用 ALDEx2 检查了体外选择性生长实验、RNA-seq 实验和人类微生物组计划 16S rRNA 基因丰度数据集的数据，ALDEx2 是一种使用贝叶斯方法推断技术和统计误差的组成数据分析工具。ALDEx2 方法适用于所有三种类型的数据：它正确识别了差异生长实验中特征的方向和差异丰度，它识别了 RNA-seq 数据集与领先工具中差异表达基因的基本相似集，并且识别了区分人类微生物组计划数据集中舌背和颊粘膜的分类群为差异。ALDEx2 的设计减少了由少数样本组成的许多特征组成的数据集产生的假阳性识别数量。

结论

对由每个特征计数组成的高通量测序数据集进行的统计分析表明，ALDEx2 R 包是一种简单而强大的工具，可应用于 RNA-seq、16S rRNA 基因测序和差异生长数据集，并且可以扩展到使用类似方法的其他技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05dd/4030730/4a5bfe9d4a0b/2049-2618-2-15-1.jpg

相似文献

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.统一高通量测序数据集的分析：通过组合数据分析描述 RNA-seq、16S rRNA 基因测序和选择性生长实验。

Microbiome. 2014 May 5;2:15. doi: 10.1186/2049-2618-2-15. eCollection 2014.

Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods.RNA-Seq 差异表达分析工具的基准测试：基于标准化与基于对数比变换的方法。

BMC Bioinformatics. 2018 Jul 18;19(1):274. doi: 10.1186/s12859-018-2261-8.

Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.大规模基准测试揭示了微生物组研究中使用的 16S rRNA 基因扩增子数据分析方法中的假发现和计数转换敏感性。

Microbiome. 2016 Nov 25;4(1):62. doi: 10.1186/s40168-016-0208-8.

It's all relative: analyzing microbiome data as compositions.一切都是相对的：将微生物组数据作为成分进行分析。

Ann Epidemiol. 2016 May;26(5):322-9. doi: 10.1016/j.annepidem.2016.03.003. Epub 2016 Apr 2.

metaSPARSim: a 16S rRNA gene sequencing count data simulator.metaSPARSim：一种 16S rRNA 基因测序计数数据模拟器。

BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):416. doi: 10.1186/s12859-019-2882-6.

LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control.LOCOM：一种用于检验微生物组数据中丰度差异的逻辑回归模型，具有错误发现率控制。

Proc Natl Acad Sci U S A. 2022 Jul 26;119(30):e2122788119. doi: 10.1073/pnas.2122788119. Epub 2022 Jul 22.

metamicrobiomeR: an R package for analysis of microbiome relative abundance data using zero-inflated beta GAMLSS and meta-analysis across studies using random effects models.metamicrobiomeR：一个用于分析微生物组相对丰度数据的 R 包，使用零膨胀 beta GAMLSS 和随机效应模型进行跨研究的荟萃分析。

BMC Bioinformatics. 2019 Apr 16;20(1):188. doi: 10.1186/s12859-019-2744-2.

Modified RNA-seq method for microbial community and diversity analysis using rRNA in different types of environmental samples.利用不同类型环境样本中的rRNA进行微生物群落和多样性分析的改良RNA测序方法。

PLoS One. 2017 Oct 10;12(10):e0186161. doi: 10.1371/journal.pone.0186161. eCollection 2017.

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.用于肠道微生物组组成分析的测序平台和生物信息学管道的比较。

BMC Microbiol. 2017 Sep 13;17(1):194. doi: 10.1186/s12866-017-1101-8.

Effects of Rare Microbiome Taxa Filtering on Statistical Analysis.稀有微生物群落分类过滤对统计分析的影响。

Front Microbiol. 2021 Jan 12;11:607325. doi: 10.3389/fmicb.2020.607325. eCollection 2020.

引用本文的文献

mbSparse: an autoencoder-based imputation method to address sparsity in microbiome data.mbSparse：一种基于自动编码器的插补方法，用于解决微生物组数据中的稀疏性问题。

Gut Microbes. 2025 Dec;17(1):2552347. doi: 10.1080/19490976.2025.2552347. Epub 2025 Sep 1.

Skin metatranscriptomics reveals a landscape of variation in microbial activity and gene expression across the human body.皮肤宏转录组学揭示了人体微生物活性和基因表达的变异图谱。

Nat Biotechnol. 2025 Aug 28. doi: 10.1038/s41587-025-02797-4.

Alterations in the Microbiome of Horses Affected with Fecal Water Syndrome.患有粪便水综合征马匹的微生物群变化

Vet Sci. 2025 Jul 31;12(8):724. doi: 10.3390/vetsci12080724.

Use of Subtherapeutic Tylvalosin Against : Implications For Respiratory Microbiome Dysbiosis and Swine Lung Health.亚治疗剂量泰万菌素的使用：对呼吸道微生物群失调和猪肺部健康的影响

Transbound Emerg Dis. 2025 Aug 18;2025:8903237. doi: 10.1155/tbed/8903237. eCollection 2025.

Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.使用ALDEx2对RNA测序计数数据进行分析的显式尺度模拟。

NAR Genom Bioinform. 2025 Aug 19;7(3):lqaf108. doi: 10.1093/nargab/lqaf108. eCollection 2025 Sep.

Melody: meta-analysis of microbiome association studies for discovering generalizable microbial signatures.旋律：微生物组关联研究的荟萃分析，用于发现可推广的微生物特征。

Genome Biol. 2025 Aug 18;26(1):245. doi: 10.1186/s13059-025-03721-4.

Combining site-specific gut microbiome and mycobiome profiling with clinical indicators for effective management of pediatric Crohn's disease.将特定部位的肠道微生物组和真菌微生物组分析与临床指标相结合，以有效管理儿童克罗恩病。

iScience. 2025 Jul 18;28(8):113160. doi: 10.1016/j.isci.2025.113160. eCollection 2025 Aug 15.

Florida Keys Cassiopea host benthos-like external microbiomes and a gut dominated by Vibrio, Endozoicomonas and Mycoplasma.佛罗里达群岛的仙女水母拥有类似底栖生物的外部微生物群落，其肠道中以弧菌属、内共生单胞菌属和支原体属为主。

PLoS One. 2025 Aug 12;20(8):e0330180. doi: 10.1371/journal.pone.0330180. eCollection 2025.

Gut microbiota profile in newly diagnosed pulmonary tuberculosis patients: an exploratory pilot study in southern India.新诊断肺结核患者的肠道微生物群概况：印度南部的一项探索性初步研究。

Gut Pathog. 2025 Aug 11;17(1):59. doi: 10.1186/s13099-025-00736-x.

Holobiont-based genetic analysis reveals new plant and microbial markers for resistance against a root rot pathogen complex in pea.基于共生功能体的遗传分析揭示了豌豆抗根腐病原菌复合体的新植物和微生物标记。

BMC Plant Biol. 2025 Aug 9;25(1):1053. doi: 10.1186/s12870-025-06995-9.

本文引用的文献

Ecologically meaningful transformations for ordination of species data.用于物种数据排序的具有生态学意义的变换

Oecologia. 2001 Oct;129(2):271-280. doi: 10.1007/s004420100716. Epub 2001 Oct 1.

Control of catalytic efficiency by a coevolving network of catalytic and noncatalytic residues.通过催化和非催化残基的共进化网络控制催化效率。

Proc Natl Acad Sci U S A. 2014 Jun 10;111(23):E2376-83. doi: 10.1073/pnas.1322352111. Epub 2014 May 27.

RNA-seq differential expression studies: more sequence or more replication?RNA-seq 差异表达研究：更多的序列还是更多的重复？

Bioinformatics. 2014 Feb 1;30(3):301-4. doi: 10.1093/bioinformatics/btt688. Epub 2013 Dec 6.

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor.基于计数的 RNA 测序数据分析使用 R 和 Bioconductor。

Nat Protoc. 2013 Sep;8(9):1765-86. doi: 10.1038/nprot.2013.099. Epub 2013 Aug 22.

ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-Seq.混合群体 RNA-Seq 的 ANOVA 样差异表达 (ALDEx) 分析。

PLoS One. 2013 Jul 2;8(7):e67019. doi: 10.1371/journal.pone.0067019. Print 2013.

A statistical framework for power calculations in ChIP-seq experiments.ChIP-seq 实验中功效计算的统计框架。

Bioinformatics. 2014 Mar 15;30(6):753-60. doi: 10.1093/bioinformatics/btt200. Epub 2013 May 10.

Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution.基于贝塔二项式分布的高通量测序数据配对的经验贝叶斯分析。

BMC Bioinformatics. 2013 Apr 23;14:135. doi: 10.1186/1471-2105-14-135.

A comparison of methods for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析方法的比较。

BMC Bioinformatics. 2013 Mar 9;14:91. doi: 10.1186/1471-2105-14-91.

Hypothesis testing and power calculations for taxonomic-based human microbiome data.基于分类的人类微生物组数据的假设检验和功效计算。

PLoS One. 2012;7(12):e52078. doi: 10.1371/journal.pone.0052078. Epub 2012 Dec 20.

Inferring correlation networks from genomic survey data.从基因组普查数据推断关联网络。

PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. Epub 2012 Sep 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

统一高通量测序数据集的分析：通过组合数据分析描述 RNA-seq、16S rRNA 基因测序和选择性生长实验。

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献