Yang Yee Hwa, Xiao Yuanyuan, Segal Mark R
Departments of Medicine, Center for Bioinformatics and Molecular Biostatistics, University of California San Francisco, CA 94143, USA.
Bioinformatics. 2005 Apr 1;21(7):1084-93. doi: 10.1093/bioinformatics/bti108. Epub 2004 Oct 28.
A common objective of microarray experiments is the detection of differential gene expression between samples obtained under different conditions. The task of identifying differentially expressed genes consists of two aspects: ranking and selection. Numerous statistics have been proposed to rank genes in order of evidence for differential expression. However, no one statistic is universally optimal and there is seldom any basis or guidance that can direct toward a particular statistic of choice.
Our new approach, which addresses both ranking and selection of differentially expressed genes, integrates differing statistics via a distance synthesis scheme. Using a set of (Affymetrix) spike-in datasets, in which differentially expressed genes are known, we demonstrate that our method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics. We further evaluate performance on one other microarray study.
微阵列实验的一个常见目标是检测在不同条件下获得的样本之间的基因差异表达。识别差异表达基因的任务包括两个方面:排序和选择。已经提出了许多统计方法来按照差异表达的证据对基因进行排序。然而,没有一种统计方法是普遍最优的,而且很少有任何依据或指导可以指向特定的选择统计方法。
我们的新方法解决了差异表达基因的排序和选择问题,通过距离合成方案整合了不同的统计方法。使用一组(Affymetrix)掺入数据集(其中差异表达基因是已知的),我们证明我们的方法与最佳的单个统计方法相比具有优势,同时实现了单个统计方法所缺乏的稳健性。我们还在另一项微阵列研究中评估了性能。