Stekel D J, Git Y, Falciani F
Oxford Gene Technology, Littlemore Park, Oxford OX4 4SS, UK.
Genome Res. 2000 Dec;10(12):2055-61. doi: 10.1101/gr.gr-1325rr.
We describe a method for comparing the abundance of gene transcripts in cDNA libraries. This method allows for the comparison of gene expression in any number of libraries, in a single statistical analysis, to identify differentially expressed genes. Such genes may be of potential biological or pharmaceutical relevance. The formula that we derive is essentially the entropy of a partitioning of genes among cDNA libraries. This work goes beyond previously published analyses, which can either compare only two libraries, or identify a single outlier in a group of libraries. This work also addresses the problem of false positives associated with repeating the test on many thousands of genes. A randomization procedure is described that provides a quantitative measure of the degree of belief in the results; the results are further verified by considering a theoretically derived large deviations rate for the test statistic. As an example, the analysis is applied to four prostate cancer libraries from the Cancer Genome Anatomy Project. The analysis identifies biologically relevant genes that are differentially expressed in the different tumor cell types.
我们描述了一种比较cDNA文库中基因转录本丰度的方法。该方法允许在单一统计分析中比较任意数量文库中的基因表达,以鉴定差异表达基因。此类基因可能具有潜在的生物学或药学相关性。我们推导的公式本质上是基因在cDNA文库间分配的熵。这项工作超越了先前发表的分析,此前的分析要么只能比较两个文库,要么只能在一组文库中识别单个异常值。这项工作还解决了与对数以千计的基因重复进行测试相关的假阳性问题。描述了一种随机化程序,该程序提供了对结果可信度的定量度量;通过考虑检验统计量的理论推导大偏差率,进一步验证了结果。作为一个例子,该分析应用于癌症基因组解剖计划的四个前列腺癌文库。该分析鉴定出在不同肿瘤细胞类型中差异表达的具有生物学相关性的基因。