Mathur Sunil
Department of Mathematics, University of Mississippi, University, Mississippi, USA.
Appl Bioinformatics. 2005;4(4):247-51. doi: 10.2165/00822942-200504040-00004.
DNA microarray technology allows researchers to monitor the expressions of thousands of genes under different conditions, and to measure the levels of thousands of different DNA molecules at a given point in the life of an organism, tissue or cell. A wide variety of different diseases that are characterised by unregulated gene expression, DNA replication, cell division and cell death, can be detected early using microarrays. One of the major objectives of microarray experiments is to identify differentially expressed genes under various conditions. The detection of differential gene expression under two different conditions is very important in biological studies, and allows us to identify experimental variables that affect different biological processes. Most of the tests available in the literature are based on the assumption of normal distribution. However, the assumption of normality may not be true in real-life data, particularly with respect to microarray data.A test is proposed for the identification of differentially expressed genes in replicated microarray experiments conducted under two different conditions. The proposed test does not assume the distribution of the parent population; thus, the proposed test is strictly nonparametric in nature. We calculate the p-value and the asymptotic power function of the proposed test statistic. The proposed test statistic is compared with some of its competitors under normal, gamma and exponential population setup using the Monte Carlo simulation technique. The application of the proposed test statistic is presented using microarray data. The proposed test is robust and highly efficient when populations are non-normal.
DNA微阵列技术使研究人员能够监测数千个基因在不同条件下的表达情况,并在生物体、组织或细胞生命中的给定时刻测量数千种不同DNA分子的水平。利用微阵列可以早期检测出多种以基因表达失控、DNA复制、细胞分裂和细胞死亡为特征的不同疾病。微阵列实验的主要目标之一是识别各种条件下差异表达的基因。在两种不同条件下检测差异基因表达在生物学研究中非常重要,它使我们能够识别影响不同生物过程的实验变量。文献中现有的大多数检验都是基于正态分布的假设。然而,在实际数据中,特别是对于微阵列数据,正态性假设可能并不成立。本文提出了一种用于识别在两种不同条件下进行的重复微阵列实验中差异表达基因的检验方法。所提出的检验不假设总体分布;因此,所提出的检验本质上是严格非参数的。我们计算了所提出检验统计量的p值和渐近功效函数。使用蒙特卡罗模拟技术,在所提出的检验统计量与在正态、伽马和指数总体设置下的一些竞争检验统计量之间进行了比较。利用微阵列数据展示了所提出检验统计量的应用。当总体非正态时,所提出的检验具有稳健性且效率很高。