School of Physics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK.
BMC Genomics. 2010 Feb 24;11:134. doi: 10.1186/1471-2164-11-134.
Microarray technology is a popular means of producing whole genome transcriptional profiles, however high cost and scarcity of mRNA has led many studies to be conducted based on the analysis of single samples. We exploit the design of the Illumina platform, specifically multiple arrays on each chip, to evaluate intra-experiment technical variation using repeated hybridisations of universal human reference RNA (UHRR) and duplicate hybridisations of primary breast tumour samples from a clinical study.
A clear batch-specific bias was detected in the measured expressions of both the UHRR and clinical samples. This bias was found to persist following standard microarray normalisation techniques. However, when mean-centering or empirical Bayes batch-correction methods (ComBat) were applied to the data, inter-batch variation in the UHRR and clinical samples were greatly reduced. Correlation between replicate UHRR samples improved by two orders of magnitude following batch-correction using ComBat (ranging from 0.9833-0.9991 to 0.9997-0.9999) and increased the consistency of the gene-lists from the duplicate clinical samples, from 11.6% in quantile normalised data to 66.4% in batch-corrected data. The use of UHRR as an inter-batch calibrator provided a small additional benefit when used in conjunction with ComBat, further increasing the agreement between the two gene-lists, up to 74.1%.
In the interests of practicalities and cost, these results suggest that single samples can generate reliable data, but only after careful compensation for technical bias in the experiment. We recommend that investigators appreciate the propensity for such variation in the design stages of a microarray experiment and that the use of suitable correction methods become routine during the statistical analysis of the data.
微阵列技术是产生全基因组转录谱的一种流行手段,然而,mRNA 的高成本和稀缺性导致许多研究基于单个样本的分析进行。我们利用 Illumina 平台的设计,特别是每个芯片上的多个阵列,通过重复杂交通用人类参考 RNA(UHRR)和临床研究中来自原发性乳腺癌样本的重复杂交来评估实验内技术变异。
在测量的 UHRR 和临床样本的表达中均检测到明显的批次特异性偏差。在应用标准微阵列归一化技术后,仍发现存在这种偏差。但是,当对数据应用均值中心化或经验贝叶斯批次校正方法(ComBat)时,UHRR 和临床样本中的批次间差异大大减少。使用 ComBat 进行批次校正后,重复 UHRR 样本之间的相关性提高了两个数量级(范围从 0.9833-0.9991 到 0.9997-0.9999),并增加了来自重复临床样本的基因列表的一致性,从定量归一化数据中的 11.6%增加到批次校正数据中的 66.4%。在与 ComBat 结合使用时,UHRR 作为批次间校准器的使用提供了额外的益处,进一步提高了两个基因列表之间的一致性,最高可达 74.1%。
为了实际和成本的考虑,这些结果表明,单个样本可以生成可靠的数据,但仅在仔细补偿实验中的技术偏差后才能实现。我们建议研究人员在微阵列实验的设计阶段注意到这种变异的倾向,并建议在数据的统计分析中常规使用合适的校正方法。