Li Wentian, Yang Yaning
Center for Genomics and Human Genetics North Shore LIJ Research Institute, 350 Community Drive, Manhasset, NY 11030, USA.
J Theor Biol. 2002 Dec 21;219(4):539-51. doi: 10.1006/jtbi.2002.3145.
Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the "important" genes and "irrelevant" genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known chi(2) distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood.
通过测量一个基因在两种生物化学/表型不同的条件下的差异表达程度,我们可以对微阵列数据集中的所有基因进行排名。我们已经表明,这种测量值(如逻辑回归等分类模型中的归一化最大似然值)作为排名的函数下降,通常是一个幂律函数。在其他类似的排名图中,这种幂律函数被称为齐普夫定律,在许多自然和社会现象中都有观察到。这种幂律函数的存在阻止了“重要”基因和“无关”基因之间的内在分界点。我们已经表明,类似的幂律函数也存在于置换数据集中,并从似然比的著名卡方分布中给出了解释。我们讨论了这种齐普夫定律在微阵列数据分析中基因选择的意义,以及排名似然图的其他特征,如似然下降率。