Lai Yinglei
Department of Statistics and Biostatistics Center, George Washington University, 2140 Pennsylvania Avenue, N.W., Washington, DC 20052, USA.
Comput Biol Chem. 2006 Oct;30(5):321-6. doi: 10.1016/j.compbiolchem.2006.06.002. Epub 2006 Sep 18.
It has been shown that the generalized F-statistics can give satisfactory performances in identifying differentially expressed genes with microarray data. However, for some complex diseases, it is still possible to identify a high proportion of false positives because of the modest differential expressions of disease related genes and the systematic noises of microarrays. The main purpose of this study is to develop statistical methods for Affymetrix microarray gene expression data so that the impact on false positives from non-expressed genes can be reduced. I proposed two novel generalized F-statistics for identifying differentially expressed genes and a novel approach for estimating adjusting factors. The proposed statistical methods systematically combine filtering of non-expressed genes and identification of differentially expressed genes. For comparison, the discussed statistical methods were applied to an experimental data set for a type 2 diabetes study. In both two- and three-sample analyses, the proposed statistics showed improvement on the control of false positives.
研究表明,广义F统计量在利用微阵列数据识别差异表达基因方面能给出令人满意的结果。然而,对于一些复杂疾病,由于疾病相关基因的差异表达程度适中以及微阵列的系统噪声,仍有可能识别出高比例的假阳性。本研究的主要目的是为Affymetrix微阵列基因表达数据开发统计方法,以便减少未表达基因对假阳性的影响。我提出了两种用于识别差异表达基因的新型广义F统计量以及一种估计调整因子的新方法。所提出的统计方法系统地结合了未表达基因的过滤和差异表达基因的识别。为作比较,将所讨论的统计方法应用于一项2型糖尿病研究的实验数据集。在双样本和三样本分析中,所提出的统计量在控制假阳性方面均有改进。