Larsen Martin J, Thomassen Mads, Tan Qihua, Sørensen Kristina P, Kruse Torben A
Department of Clinical Genetics, Odense University Hospital, Sdr. Boulevard 29, 5000 Odense C, Denmark ; Human Genetics, Institute of Clinical Research, University of Southern Denmark, Winsløwvej 19, 5000 Odense C, Denmark.
Human Genetics, Institute of Clinical Research, University of Southern Denmark, Winsløwvej 19, 5000 Odense C, Denmark ; Epidemiology, Biostatistics and Biodemography, Institute of Public Health, University of Southern Denmark, J.B. Winsløws Vej 9B, 5000 Odense C, Denmark.
Biomed Res Int. 2014;2014:651751. doi: 10.1155/2014/651751. Epub 2014 Jul 3.
Microarray is a powerful technique used extensively for gene expression analysis. Different technologies are available, but lack of standardization makes it challenging to compare and integrate data. Furthermore, batch-related biases within datasets are common but often not tackled. We have analyzed the same 234 breast cancers on two different microarray platforms. One dataset contained known batch-effects associated with the fabrication procedure used. The aim was to assess the significance of correcting for systematic batch-effects when integrating data from different platforms. We here demonstrate the importance of detecting batch-effects and how tools, such as ComBat, can be used to successfully overcome such systematic variations in order to unmask essential biological signals. Batch adjustment was found to be particularly valuable in the detection of more delicate differences in gene expression. Furthermore, our results show that prober adjustment is essential for integration of gene expression data obtained from multiple sources. We show that high-variance genes are highly reproducibly expressed across platforms making them particularly well suited as biomarkers and for building gene signatures, exemplified by prediction of estrogen-receptor status and molecular subtypes. In conclusion, the study emphasizes the importance of utilizing proper batch adjustment methods when integrating data across different batches and platforms.
微阵列是一种广泛用于基因表达分析的强大技术。虽然有不同的技术可用,但缺乏标准化使得数据的比较和整合具有挑战性。此外,数据集中与批次相关的偏差很常见,但往往没有得到解决。我们在两个不同的微阵列平台上分析了相同的234例乳腺癌。一个数据集包含与所使用的制造过程相关的已知批次效应。目的是评估在整合来自不同平台的数据时校正系统性批次效应的重要性。我们在此证明了检测批次效应的重要性,以及如何使用ComBat等工具成功克服此类系统性变异,以揭示重要的生物学信号。发现批次调整在检测基因表达中更细微的差异时特别有价值。此外,我们的结果表明,探针调整对于整合从多个来源获得的基因表达数据至关重要。我们表明,高变异基因在不同平台上具有高度可重复性表达,这使得它们特别适合作为生物标志物和构建基因特征,以雌激素受体状态和分子亚型的预测为例。总之,该研究强调了在整合不同批次和平台的数据时使用适当的批次调整方法的重要性。