Suppr超能文献

理解测序数据作为组成:展望与回顾。

Understanding sequencing data as compositions: an outlook and review.

机构信息

Bioinformatics Core Research Group, Deakin University, Geelong, Australia.

Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.

出版信息

Bioinformatics. 2018 Aug 15;34(16):2870-2878. doi: 10.1093/bioinformatics/bty175.

Abstract

MOTIVATION

Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models.

RESULTS

The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

尽管很少被明确承认,但测序平台生成的计数数据实际上是一种组合,其中每个成分(例如基因或转录本)的丰度只有相对于该样本中的其他成分才有意义。这种特性源于检测技术本身,即每个样本记录的计数数量受到任意总和(即文库大小)的限制。因此,测序数据作为组合数据,存在于非欧几里得空间中,如果不进行归一化或转换,许多传统的分析方法(包括距离度量、相关系数和多元统计模型)都是无效的。

结果

本综述的目的是总结组合数据分析(CoDA)的原理,提供测序数据为何具有组合性的证据,讨论可用于分析测序数据的组合有效方法,并强调该研究领域的未来方向。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
Understanding sequencing data as compositions: an outlook and review.理解测序数据作为组成:展望与回顾。
Bioinformatics. 2018 Aug 15;34(16):2870-2878. doi: 10.1093/bioinformatics/bty175.
2
It's all relative: analyzing microbiome data as compositions.一切都是相对的:将微生物组数据作为成分进行分析。
Ann Epidemiol. 2016 May;26(5):322-9. doi: 10.1016/j.annepidem.2016.03.003. Epub 2016 Apr 2.
7
8
Statistical modeling of sequencing errors in SAGE libraries.SAGE文库中测序错误的统计建模
Bioinformatics. 2004 Aug 4;20 Suppl 1:i31-9. doi: 10.1093/bioinformatics/bth924.

引用本文的文献

本文引用的文献

7
Simulation-based comprehensive benchmarking of RNA-seq aligners.基于模拟的RNA测序比对工具综合基准测试
Nat Methods. 2017 Feb;14(2):135-139. doi: 10.1038/nmeth.4106. Epub 2016 Dec 12.
9
A benchmark for RNA-seq quantification pipelines.RNA测序定量流程的一个基准。
Genome Biol. 2016 Apr 23;17:74. doi: 10.1186/s13059-016-0940-1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验