Wu Zhijin, Aryee Martin J
Center for Statistical Sciences and Department of Community Health, Brown University, Providence, Rhode Island 02912, USA.
J Comput Biol. 2010 Oct;17(10):1385-95. doi: 10.1089/cmb.2010.0049.
Normalization has been recognized as a necessary preprocessing step in a variety of high-throughput biotechnologies. A number of normalization methods have been developed specifically for microarrays, some general and others tailored for certain experimental designs. All methods rely on assumptions about data characteristics that are expected to stay constant across samples, although some make it more explicit than others. Most methods make assumptions that certain quantities related to the biological signal of interest stay the same; this is reasonable for many experiments but usually not verifiable. Recently, several platforms have begun to include a large number of negative control probes that nonetheless cover nearly the entire range of the measured signal intensity. Using these probes as a normalization basis makes it possible to normalize without making assumptions about the behavior of the biological signal. We present a subset quantile normalization (SQN) procedure that normalizes based on the distribution of non-specific control features, without restriction on the behavior of specific signals. We illustrate the performance of this method using three different platforms and experimental settings. Compared to two other leading nonlinear normalization procedures, the SQN method preserves more biological variation after normalization while reducing the noise observed on control features. Although the illustration datasets are from microarray experiments, this method is general for all high throughput technologies that include a large set of control features that have constant expectations across samples. It does not require an equal number of features in all samples and tolerates missing data.
标准化已被公认为是各种高通量生物技术中必要的预处理步骤。已经专门为微阵列开发了许多标准化方法,有些方法具有通用性,有些则是针对特定实验设计量身定制的。所有方法都依赖于关于数据特征的假设,这些假设预期在不同样本中保持不变,尽管有些方法比其他方法更明确地阐述了这些假设。大多数方法假设与感兴趣的生物信号相关的某些量保持不变;这在许多实验中是合理的,但通常无法验证。最近,一些平台开始纳入大量的阴性对照探针,这些探针几乎覆盖了测量信号强度的整个范围。将这些探针用作标准化基础使得在不假设生物信号行为的情况下进行标准化成为可能。我们提出了一种子集分位数标准化(SQN)程序,该程序基于非特异性对照特征的分布进行标准化,而不对特定信号的行为进行限制。我们使用三种不同的平台和实验设置说明了该方法的性能。与其他两种领先的非线性标准化程序相比,SQN方法在标准化后保留了更多的生物变异,同时减少了在对照特征上观察到的噪声。尽管示例数据集来自微阵列实验,但该方法适用于所有包含大量对照特征且这些特征在不同样本中具有恒定预期的高通量技术。它不要求所有样本中的特征数量相等,并且能够容忍缺失数据。