Yoneya Takashi, Miyazawa Tatsuya
Bioinformation. 2011 Feb 7;5(9):382-5. doi: 10.6026/97320630005382.
An enormous amount of microarray data has been collected and accumulated in public repositories. Although some of the depositions include raw and processed data, significant parts of them include processed data only. If we need to combine multiple datasets for specific purposes, the data should be adjusted prior to use to remove bias between the datasets. We focused on a GeneChip platform and a pre-processing method, RMA, and examined simple quantile correction as the post-processing method for integration. Integration of the data pre-processed by RMA was evaluated using artificial spike-in datasets and real microarray datasets of atopic dermatitis and lung cancer. Studies using the spike-in datasets show that the quantile correction for data integration reduces the data quality at some extent but it should be acceptable level. Studies using the real datasets show that the quantile correction significantly reduces the bias. These results show that the quantile correction is useful for integration of multiple datasets processed by RMA, and encourage effective use of public microarray data.
大量的微阵列数据已在公共数据库中收集和积累。虽然有些存档包含原始数据和处理后的数据,但其中很大一部分仅包含处理后的数据。如果我们需要为特定目的合并多个数据集,则应在使用前对数据进行调整,以消除数据集之间的偏差。我们专注于基因芯片平台和一种预处理方法——RMA,并研究了简单分位数校正作为集成的后处理方法。使用人工掺入数据集以及特应性皮炎和肺癌的真实微阵列数据集对经RMA预处理的数据进行集成评估。使用掺入数据集的研究表明,用于数据集成的分位数校正会在一定程度上降低数据质量,但仍处于可接受水平。使用真实数据集的研究表明,分位数校正能显著减少偏差。这些结果表明,分位数校正对于由RMA处理的多个数据集的集成很有用,并鼓励有效利用公共微阵列数据。