Goldstein Darlene R
Ecole Polytechnique Fédérale de Lausanne, Institut de mathématiques, Bâtiment MA, Station 8, CH-1015 Lausanne, Switzerland.
Bioinformatics. 2006 Oct 1;22(19):2364-72. doi: 10.1093/bioinformatics/btl402. Epub 2006 Jul 28.
Studies of gene expression using high-density short oligonucleotide arrays have become a standard in a variety of biological contexts. Of the expression measures that have been proposed to quantify expression in these arrays, multi-chip-based measures have been shown to perform well. As gene expression studies increase in size, however, utilizing multi-chip expression measures is more challenging in terms of computing memory requirements and time.
A strategic alternative to exact multi-chip quantification on a full large chip set is to approximate expression values based on subsets of chips. This paper introduces an extrapolation method, Extrapolation Averaging (EA), and a resampling method, Partition Resampling (PR), to approximate expression in large studies. An examination of properties indicates that subset-based methods can perform well compared with exact expression quantification. The focus is on short oligonucleotide chips, but the same ideas apply equally well to any array type for which expression is quantified using an entire set of arrays, rather than for only a single array at a time.
Software implementing Partition Resampling and Extrapolation Averaging is under development as an R package for the BioConductor project.
在各种生物学背景下,使用高密度短寡核苷酸阵列进行基因表达研究已成为一种标准方法。在已提出的用于量化这些阵列中基因表达的方法中,基于多芯片的方法已被证明表现良好。然而,随着基因表达研究规模的扩大,就计算内存需求和时间而言,使用基于多芯片的表达量度更具挑战性。
在完整的大型芯片集上进行精确的多芯片量化的一种策略性替代方法是基于芯片子集来近似表达值。本文介绍了一种外推方法,即外推平均法(EA),以及一种重采样方法,即分区重采样法(PR),用于在大型研究中近似表达值。对这些方法特性的研究表明,与精确的表达量化相比,基于子集的方法可以表现良好。重点是短寡核苷酸芯片,但同样的思路同样适用于任何通过使用一整套阵列而非一次仅使用单个阵列来量化表达的阵列类型。
实现分区重采样和外推平均的软件正在作为BioConductor项目的R包进行开发。