Li Hua, Zhu Dongxiao, Cook Malcolm
Bioinformatics Center, Stowers Institute for Medical Research, 1000 E 50th St, Kansas City, MO 64110, USA.
BMC Genomics. 2008 Apr 24;9:188. doi: 10.1186/1471-2164-9-188.
Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study. These probe sets may or may not behave similar across treatments. The most appropriate way of consolidating sibling probe sets suitable for analysis is an open problem. We propose the Analysis of Variance (ANOVA) framework to decide which sibling probe sets can be consolidated.
The ANOVA model allows us to separate the sibling probe sets into two types: those behave similarly across treatments and those behave differently across treatments. We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria. The approach to selecting sibling probe sets suitable for consolidating is implemented in R language and freely available from http://research.stowers-institute.org/hul/affy/.
Our ANOVA analysis of sibling probe sets provides a statistical framework for selecting sibling probe sets for consolidation. Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes. Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.
Affymetrix基因芯片通常每个基因包含多个探针集,在本研究中定义为同胞探针集。这些探针集在不同处理下的表现可能相似,也可能不同。整合适合分析的同胞探针集的最合适方法是一个尚未解决的问题。我们提出方差分析(ANOVA)框架来确定哪些同胞探针集可以整合。
方差分析模型使我们能够将同胞探针集分为两类:在不同处理下表现相似的探针集和在不同处理下表现不同的探针集。我们发现,整合前一类的同胞探针集会导致在各种统计标准下差异表达基因的数量大幅增加。选择适合整合的同胞探针集的方法用R语言实现,可从http://research.stowers-institute.org/hul/affy/免费获取。
我们对同胞探针集的方差分析为选择用于整合的同胞探针集提供了一个统计框架。通过汇总每个探针集的数据来整合同胞探针集,极大地提高了基因表达水平的估计,并导致鉴定出更多具有生物学相关性的基因。不符合整合条件的同胞探针集可能代表注释错误或其他假象,或者可能对应于同一基因的差异加工转录本,需要进一步分析。