Seo Jinwook, Hoffman Eric P
Research Center for Genetic Medicine, Children's National Medical Center, 111 Michigan Ave NW, Washington DC 20010, USA.
BMC Bioinformatics. 2006 Aug 30;7:395. doi: 10.1186/1471-2105-7-395.
Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project.
Affymetrix微阵列已成为mRNA表达谱研究的标准实验平台。其成功部分归功于针对每个转录本(探针集)的多个寡核苷酸特征(探针)。这种多重检测允许进行更稳健的背景评估和基因表达测量,并促使许多计算方法得以发展,从而将图像数据转化为用于mRNA转录本丰度的单一标准化“信号”。目前已经开发出了许多探针集算法,并且逐渐从逐芯片方法(MAS5)转向基于项目的模型拟合方法(dCHIP、RMA等)。算法的选择常常会深刻改变数据解读方式,这使得困惑的生物学家质疑他们实验的“准确”解读是什么。在这里,我们总结了关于探针集算法的争论。我们举例说明错配权重、标准化以及表达比值构建的变化如何各自显著改变数据解读。所有解读在计算上都可被视为合适的,但具有不同的生物学可信度。我们还使用交互式功效分析工具说明了两种新的混合算法(PLIER、GC-RMA)相对于更传统算法(dCHIP、MAS5、探针分析器主成分分析、RMA)的性能。在避免性能不佳的探针集产生假阳性方面,PLIER似乎优于其他算法。基于我们对文献的解读以及此处给出的示例,我们认为探针集算法性能的变异性更多地取决于关于“背景”的假设,而非“信号”的计算。我们认为“背景”是一个极其复杂的变量,只能进行模糊量化,因此“最佳”探针集算法会因项目而异。