Breakthrough Research Unit, University of Edinburgh, Crewe Road South, Edinburgh, EH4 2XR, UK.
BMC Med Genomics. 2012 Aug 21;5:35. doi: 10.1186/1755-8794-5-35.
Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis.
Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets.
Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.
Affymetrix GeneChips 和 Illumina BeadArrays 是最广泛使用的商业单通道基因表达微阵列。公共数据存储库是一个极其有价值的资源,提供了来自数千个实验的基于阵列的基因表达测量值。不幸的是,这些研究中的许多都没有足够的功效,因此希望通过合并来自多个研究的数据来提高功效;我们试图确定平台特异性偏差是否会阻止直接集成探针强度信号以进行联合重新分析。
使用微阵列质量控制项目中的 Affymetrix 和 Illumina 数据、我们自己的临床样本以及其他可公开获得的数据集,我们评估了几种方法,以直接整合来自两个平台的强度水平表达数据。在将探针序列映射到 Ensembl 基因后,我们证明 ComBat 和跨平台归一化(XPN)在最小化平台间方差方面明显优于均值中心化和距离加权判别(DWD)。特别是,我们观察到 DWD(一种在许多先前研究中使用的流行方法)以牺牲真实的生物变异性为代价消除了系统偏差,这可能会降低来自集成数据集的合法生物学差异。
可以直接合并 Affymetrix 和 Illumina 微阵列的归一化和批次校正的强度水平数据,以生成具有改进统计功效的生物学上有意义的结果,从而进行稳健的综合重新分析。