College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Yudao St., Nanjing 210016, China.
BMC Bioinformatics. 2013 Feb 5;14:39. doi: 10.1186/1471-2105-14-39.
Microarrays have been a popular tool for gene expression profiling at genome-scale for over a decade due to the low cost, short turn-around time, excellent quantitative accuracy and ease of data generation. The Bioconductor package puma incorporates a suite of analysis methods for determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analysis. As isoform level expression profiling receives more and more interest within genomics in recent years, exon microarray technology offers an important tool to quantify expression level of the majority of exons and enables the possibility of measuring isoform level expression. However, puma does not include methods for the analysis of exon array data. Moreover, the current expression summarisation method for Affymetrix 3' GeneChip data suffers from instability for low expression genes. For the downstream analysis, the method for differential expression detection is computationally intensive and the original expression clustering method does not consider the variance across the replicated technical and biological measurements. It is therefore necessary to develop improved uncertainty propagation methods for gene and transcript expression analysis.
We extend the previously developed Bioconductor package puma with a new method especially designed for GeneChip Exon arrays and a set of improved downstream approaches. The improvements include: (i) a new gamma model for exon arrays which calculates isoform and gene expression measurements and a level of uncertainty associated with the estimates, using the multi-mappings between probes, isoforms and genes, (ii) a variant of the existing approach for the probe-level analysis of Affymetrix 3' GeneChip data to produce more stable gene expression estimates, (iii) an improved method for detecting differential expression which is computationally more efficient than the existing approach in the package and (iv) an improved method for robust model-based clustering of gene expression, which takes technical and biological replicate information into consideration.
With the extensions and improvements, the puma package is now applicable to the analysis of both Affymetrix 3' GeneChips and Exon arrays for gene and isoform expression estimation. It propagates the uncertainty of expression measurements into more efficient and comprehensive downstream analysis at both gene and isoform level. Downstream methods are also applicable to other expression quantification platforms, such as RNA-Seq, when uncertainty information is available from expression measurements. puma is available through Bioconductor and can be found at http://www.bioconductor.org.
由于成本低、周转时间短、出色的定量准确性和数据生成的便利性,微阵列在基因组范围内的基因表达谱分析中已经成为一种流行的工具已有十多年了。Bioconductor 包 puma 包含了一套用于确定 Affymetrix GeneChip 数据不确定性并将这些不确定性传播到下游分析的分析方法。近年来,随着异构体水平表达谱分析在基因组学中受到越来越多的关注,外显子微阵列技术提供了一种重要的工具,可以定量测量大多数外显子的表达水平,并实现测量异构体水平表达的可能性。然而,puma 不包括用于分析外显子数组数据的方法。此外,当前用于 Affymetrix 3' GeneChip 数据的表达总结方法对于低表达基因不稳定。对于下游分析,差异表达检测方法计算密集,原始表达聚类方法不考虑复制的技术和生物学测量之间的方差。因此,有必要开发改进的基因和转录物表达分析不确定性传播方法。
我们使用专门为 GeneChip Exon 阵列设计的新方法和一组改进的下游方法扩展了之前开发的 Bioconductor 包 puma。改进包括:(i)一种新的伽马模型,用于使用探针、异构体和基因之间的多映射,计算异构体和基因表达测量值以及与估计值相关的不确定性,(ii)一种用于 Affymetrix 3' GeneChip 数据探针级分析的现有方法的变体,以产生更稳定的基因表达估计值,(iii)一种用于检测差异表达的改进方法,该方法在包中比现有方法更有效,(iv)一种用于稳健基于模型的基因表达聚类的改进方法,该方法考虑了技术和生物学重复信息。
通过扩展和改进,puma 包现在可用于分析 Affymetrix 3' GeneChips 和 Exon 阵列的基因和异构体表达估计。它将表达测量的不确定性传播到更高效和全面的下游基因和异构体水平分析中。下游方法也适用于其他表达定量平台,例如 RNA-Seq,只要从表达测量中获得不确定性信息。puma 可通过 Bioconductor 获取,可在 http://www.bioconductor.org 找到。