Suppr超能文献

统一高通量测序数据集的分析:通过组合数据分析描述 RNA-seq、16S rRNA 基因测序和选择性生长实验。

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.

机构信息

, YouKaryote Genomics, London, ON, Canada.

Department of Biochemistry, Medical Science Building, University of Western Ontario, 1151 Richmond St, N6A 5C1, London, ON, Canada.

出版信息

Microbiome. 2014 May 5;2:15. doi: 10.1186/2049-2618-2-15. eCollection 2014.

Abstract

BACKGROUND

Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible.

RESULTS

Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples.

CONCLUSION

Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a similar approach.

摘要

背景

利用高通量测序生成数据集的实验设计包括 RNA 测序(RNA-seq)、染色质免疫沉淀测序(ChIP-seq)、16S rRNA 基因片段测序、宏基因组分析和选择性生长实验。在每种情况下,基础数据都是相似的,由测序读取计数组成,这些计数映射到每个样本中的大量特征上。尽管存在这种基础相似性,但用于这些实验设计的数据分析方法都不同,并且不能跨实验转换。在物理和地质科学中已经开发了替代方法,将类似的数据视为组成。组成数据分析方法将数据转换为相对丰度,从而使分析更稳健且可重复。

结果

使用 ALDEx2 检查了体外选择性生长实验、RNA-seq 实验和人类微生物组计划 16S rRNA 基因丰度数据集的数据,ALDEx2 是一种使用贝叶斯方法推断技术和统计误差的组成数据分析工具。ALDEx2 方法适用于所有三种类型的数据:它正确识别了差异生长实验中特征的方向和差异丰度,它识别了 RNA-seq 数据集与领先工具中差异表达基因的基本相似集,并且识别了区分人类微生物组计划数据集中舌背和颊粘膜的分类群为差异。ALDEx2 的设计减少了由少数样本组成的许多特征组成的数据集产生的假阳性识别数量。

结论

对由每个特征计数组成的高通量测序数据集进行的统计分析表明,ALDEx2 R 包是一种简单而强大的工具,可应用于 RNA-seq、16S rRNA 基因测序和差异生长数据集,并且可以扩展到使用类似方法的其他技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05dd/4030730/4a5bfe9d4a0b/2049-2618-2-15-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验