用于识别微阵列数据中差异表达基因的非参数方法。

Nonparametric methods for identifying differentially expressed genes in microarray data.

作者信息

Troyanskaya Olga G, Garber Mitchell E, Brown Patrick O, Botstein David, Altman Russ B

机构信息

Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.

出版信息

Bioinformatics. 2002 Nov;18(11):1454-61. doi: 10.1093/bioinformatics/18.11.1454.

DOI:10.1093/bioinformatics/18.11.1454

PMID:12424116

Abstract

MOTIVATION

Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs.

RESULTS

All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.

摘要

动机

基因表达实验提供了一种快速且系统的方法来识别与临床护理相关的疾病标志物。在本研究中，我们解决了从微阵列数据中稳健识别差异表达基因的问题。差异表达基因，即鉴别基因，是在两组用户定义的微阵列实验中表达有显著差异的基因。我们比较了三种无模型方法：（1）非参数t检验，（2）威尔科克森（或曼 - 惠特尼）秩和检验，以及（3）一种基于与完美区分基因的高皮尔逊相关性的启发式方法（“理想鉴别方法”）。我们基于模拟数据和生物数据，在不同噪声水平和p值截止值下系统地评估了每种方法的性能。

结果

在噪声水平与实际数据相似的模拟数据集中，所有方法均表现出非常低的假阳性率，并识别出了大部分差异表达基因。总体而言，秩和检验似乎最为保守，当需要对通过计算识别出的基因进行生物学测试时，这可能具有优势。然而，如果需要更具包容性的标志物列表，则较高的p值截止值或非参数t检验可能更为合适。当应用于肺癌和淋巴瘤数据集的数据时，这些方法识别出了具有生物学相关性的差异表达基因，这些基因能够清晰地区分相关组。因此，本文所述及评估的方法为识别差异表达基因以进行进一步的生物学和临床分析提供了一种便捷且稳健的方式。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于识别微阵列数据中差异表达基因的非参数方法。

Nonparametric methods for identifying differentially expressed genes in microarray data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

用于识别微阵列数据中差异表达基因的非参数方法。

Nonparametric methods for identifying differentially expressed genes in microarray data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献