Jiang Ning, Leach Lindsey J, Hu Xiaohua, Potokina Elena, Jia Tianye, Druka Arnis, Waugh Robbie, Kearsey Michael J, Luo Zewei W
School of Biosciences, The University of Birmingham, Edgbaston Birmingham B15 2TT, England, UK.
BMC Bioinformatics. 2008 Jun 17;9:284. doi: 10.1186/1471-2105-9-284.
Affymetrix high density oligonucleotide expression arrays are widely used across all fields of biological research for measuring genome-wide gene expression. An important step in processing oligonucleotide microarray data is to produce a single value for the gene expression level of an RNA transcript using one of a growing number of statistical methods. The challenge for the researcher is to decide on the most appropriate method to use to address a specific biological question with a given dataset. Although several research efforts have focused on assessing performance of a few methods in evaluating gene expression from RNA hybridization experiments with different datasets, the relative merits of the methods currently available in the literature for evaluating genome-wide gene expression from Affymetrix microarray data collected from real biological experiments remain actively debated.
The present study reports a comprehensive survey of the performance of all seven commonly used methods in evaluating genome-wide gene expression from a well-designed experiment using Affymetrix microarrays. The experiment profiled eight genetically divergent barley cultivars each with three biological replicates. The dataset so obtained confers a balanced and idealized structure for the present analysis. The methods were evaluated on their sensitivity for detecting differentially expressed genes, reproducibility of expression values across replicates, and consistency in calling differentially expressed genes. The number of genes detected as differentially expressed among methods differed by a factor of two or more at a given false discovery rate (FDR) level. Moreover, we propose the use of genes containing single feature polymorphisms (SFPs) as an empirical test for comparison among methods for the ability to detect true differential gene expression on the basis that SFPs largely correspond to cis-acting expression regulators. The PDNN method demonstrated superiority over all other methods in every comparison, whilst the default Affymetrix MAS5.0 method was clearly inferior.
A comprehensive assessment of seven commonly used data extraction methods based on an extensive barley Affymetrix gene expression dataset has shown that the PDNN method has superior performance for the detection of differentially expressed genes.
Affymetrix高密度寡核苷酸表达阵列在生物学研究的各个领域被广泛用于测量全基因组基因表达。处理寡核苷酸微阵列数据的一个重要步骤是使用越来越多的统计方法之一为RNA转录本的基因表达水平生成一个单一值。研究人员面临的挑战是决定使用最合适的方法来解决给定数据集中的特定生物学问题。尽管有几项研究致力于评估少数几种方法在使用不同数据集评估RNA杂交实验中基因表达的性能,但目前文献中用于评估从真实生物学实验收集的Affymetrix微阵列数据的全基因组基因表达的方法的相对优点仍在激烈争论中。
本研究报告了对所有七种常用方法在使用Affymetrix微阵列的精心设计实验中评估全基因组基因表达性能的全面调查。该实验分析了八个遗传差异较大的大麦品种,每个品种有三个生物学重复。如此获得的数据集为当前分析赋予了平衡且理想化的结构。对这些方法在检测差异表达基因的敏感性、重复样本间表达值的可重复性以及调用差异表达基因的一致性方面进行了评估。在给定的错误发现率(FDR)水平下,不同方法检测到的差异表达基因数量相差两倍或更多。此外,我们建议使用包含单特征多态性(SFP)的基因作为一种经验测试,以比较各方法检测真正差异基因表达的能力,因为SFP在很大程度上对应于顺式作用表达调节因子。在每次比较中,PDNN方法都显示出优于所有其他方法,而默认的Affymetrix MAS5.0方法明显较差。
基于广泛的大麦Affymetrix基因表达数据集对七种常用数据提取方法进行的全面评估表明,PDNN方法在检测差异表达基因方面具有卓越性能。