Suppr超能文献

高密度寡核苷酸阵列探针水平数据的探索、标准化及汇总

Exploration, normalization, and summaries of high density oligonucleotide array probe level data.

作者信息

Irizarry Rafael A, Hobbs Bridget, Collin Francois, Beazer-Barclay Yasmin D, Antonellis Kristen J, Scherf Uwe, Speed Terence P

机构信息

Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA.

出版信息

Biostatistics. 2003 Apr;4(2):249-64. doi: 10.1093/biostatistics/4.2.249.

Abstract

In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth's Genetics Institute involving 95 HG-U95A human GeneChip arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip arrays. We display some familiar features of the perfect match and mismatch probe (PM and MM) values of these data, and examine the variance-mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix's (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities.

摘要

在本文中,我们报告了对来自Affymetrix GeneChip系统的高密度寡核苷酸阵列数据的探索性分析,目的是改进当前使用的基因表达测量方法。我们的分析使用了三个数据集:一个由五个MGU74A小鼠基因芯片阵列组成的小型实验研究;基因逻辑公司(Gene Logic)和惠氏遗传学研究所(Wyeth's Genetics Institute)进行的一项广泛的掺入研究中的部分数据,该研究涉及95个HG-U95A人类基因芯片阵列;以及基因逻辑公司进行的一项稀释研究中的部分数据,该研究涉及75个HG-U95A基因芯片阵列。我们展示了这些数据的完全匹配和错配探针(PM和MM)值的一些常见特征,并检查了与被认为有缺陷、因此只产生噪声的探针的探针水平数据的方差-均值关系。我们解释了为什么需要使用探针水平强度将阵列相互归一化。然后,我们使用掺入数据检查PM和MM的行为,并评估三种常用的汇总指标:Affymetrix的(i)平均差异(AvDiff)和(ii)MAS 5.0信号,以及(iii)基于李和王乘法模型的表达指数(MBEI)。对探针水平数据的探索性数据分析催生了一种新的汇总指标,即背景调整、归一化和对数转换后的PM值的稳健多阵列平均值(RMA)。我们使用稀释研究数据评估这四种表达汇总指标,从偏差、方差以及(对于MBEI和RMA)模型拟合方面评估它们的行为。最后,我们使用掺入数据评估这些算法检测已知差异表达水平的能力。我们得出结论,使用RMA并使用去除探针特异性亲和力的线性模型为该量附加标准误差(SE)没有明显的缺点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验