Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50010, USA.
Adv Exp Med Biol. 2011;696:145-53. doi: 10.1007/978-1-4419-7046-6_15.
Most of the scientific journals require published microarray experiments to meet Minimum Information About a Microarray Experiment (MIAME) standards. This ensures that other researchers have the necessary information to interpret the results or reproduce them. Required MIAME information includes raw experimental data, processed data, and data processing procedures. However, the normalization method is often reported inaccurately or not at all. It may be that the scaling factor is not even known except to experienced users of the normalization software. We propose that using a seeded clustering algorithm, researchers can identify or verify previously unknown or doubtful normalization information. For that, we generate descriptive statistics (mean, variance, quantiles, and moments) for normalized expression data from gene chip experiments available in the ArrayExpress database and cluster chips based on these statistics. To verify that clustering grouped chips by normalization method, we normalize raw data for chips chosen from experiments in ArrayExpress using multiple methods. We then generate the same descriptive statistics for the normalized data and cluster the chips using these statistics. We use this dataset of known pedigree as seeding data to identify normalization methods used in unknown or doubtful situations.
大多数科学期刊要求发表的微阵列实验满足微阵列实验信息最低要求(MIAME)标准。这确保了其他研究人员拥有解释结果或重现结果所需的必要信息。所需的 MIAME 信息包括原始实验数据、处理后的数据和数据处理过程。然而,归一化方法的报告往往不准确,甚至根本没有报告。除了归一化软件的经验丰富的用户之外,可能甚至不知道缩放因子。我们建议使用种子聚类算法,研究人员可以识别或验证先前未知或可疑的归一化信息。为此,我们从 ArrayExpress 数据库中可用的基因芯片实验中生成归一化表达数据的描述性统计信息(均值、方差、分位数和矩),并根据这些统计信息对芯片进行聚类。为了验证聚类是否根据归一化方法对芯片进行分组,我们使用多种方法对来自 ArrayExpress 实验的选定芯片进行原始数据归一化。然后,我们为归一化后的数据生成相同的描述性统计信息,并使用这些统计信息对芯片进行聚类。我们使用这个已知谱系的数据集作为种子数据,以识别未知或可疑情况下使用的归一化方法。