Suppr超能文献

微阵列数据分析中差异表达基因的选择。

Selection of differentially expressed genes in microarray data analysis.

作者信息

Chen J J, Wang S-J, Tsai C-A, Lin C-J

机构信息

Division of Biometry and Risk Assessment, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA.

出版信息

Pharmacogenomics J. 2007 Jun;7(3):212-20. doi: 10.1038/sj.tpj.6500412. Epub 2006 Aug 29.

Abstract

One common objective in microarray experiments is to identify a subset of genes that express differentially among different experimental conditions, for example, between drug treatment and no drug treatment. Often, the goal is to determine the underlying relationship between poor versus good gene signatures for identifying biological functions or predicting specific therapeutic outcomes. Because of the complexity in studying hundreds or thousands of genes in an experiment, selection of a subset of genes to enhance relationships among the underlying biological structures or to improve prediction accuracy of clinical outcomes has been an important issue in microarray data analysis. Selection of differentially expressed genes is a two-step process. The first step is to select an appropriate test statistic and compute the P-value. The genes are ranked according to their P-values as evidence of differential expression. The second step is to assign a significance level, that is, to determine a cutoff threshold from the P-values in accordance with the study objective. In this paper, we consider four commonly used statistics, t-, S- (SAM), U-(Mann-Whitney) and M-statistics to compute the P-values for gene ranking. We consider the family-wise error and false discovery rate false-positive error-controlled procedures to select a limited number of genes, and a receiver-operating characteristic (ROC) approach to select a larger number of genes for assigning the significance level. The ROC approach is particularly useful in genomic/genetic profiling studies. The well-known colon cancer data containing 22 normal and 40 tumor tissues are used to illustrate different gene ranking and significance level assignment methods for applications to genomic/genetic profiling studies. The P-values computed from the t-, U- and M-statistics are very similar. We discuss the common practice that uses the P-value, false-positive error probability, as the primary criterion, and then uses the fold-change as a surrogate measure of biological significance for gene selection. The P-value and the fold-change can be pictorially shown simultaneously in a volcano plot. We also address several issues on gene selection.

摘要

微阵列实验的一个常见目标是识别在不同实验条件下差异表达的基因子集,例如,在药物治疗和无药物治疗之间。通常,目标是确定用于识别生物学功能或预测特定治疗结果的不良与良好基因特征之间的潜在关系。由于在实验中研究成百上千个基因的复杂性,选择基因子集以增强潜在生物学结构之间的关系或提高临床结果的预测准确性一直是微阵列数据分析中的一个重要问题。差异表达基因的选择是一个两步过程。第一步是选择合适的检验统计量并计算P值。根据基因的P值对其进行排序,作为差异表达的证据。第二步是指定一个显著性水平,即根据研究目标从P值中确定一个截止阈值。在本文中,我们考虑四种常用的统计量,t统计量、S统计量(SAM)、U统计量(曼-惠特尼)和M统计量,来计算用于基因排序的P值。我们考虑采用控制家族性误差和错误发现率假阳性误差的程序来选择有限数量的基因,并采用受试者工作特征(ROC)方法来选择更多数量的基因以指定显著性水平。ROC方法在基因组/遗传图谱研究中特别有用。使用包含22个正常组织和40个肿瘤组织的著名结肠癌数据来说明用于基因组/遗传图谱研究的不同基因排序和显著性水平分配方法。从t统计量、U统计量和M统计量计算出的P值非常相似。我们讨论了将P值(假阳性错误概率)作为主要标准,然后使用倍数变化作为基因选择生物学意义的替代指标的常见做法。P值和倍数变化可以在火山图中同时直观地显示出来。我们还讨论了基因选择的几个问题。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验