用于微阵列数据的数据自适应检验统计量。

Data-adaptive test statistics for microarray data.

作者信息

Mukherjee Sach, Roberts Stephen J, van der Laan Mark J

机构信息

Department of Engineering Science, University of Oxford, UK.

出版信息

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii108-14. doi: 10.1093/bioinformatics/bti1119.

DOI:10.1093/bioinformatics/bti1119

PMID:16204088

Abstract

MOTIVATION

An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution.

RESULTS

In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data.

AVAILABILITY

By request to the corresponding author.

摘要

动机

微阵列数据分析中的一项重要任务是选择在不同组织样本（如健康样本和患病样本）之间差异表达的基因。然而，微阵列数据包含大量维度（基因）和极少样本（阵列），这种不匹配给选择过程带来了基本的统计问题，难以轻易解决。

结果

在本文中，我们提出了一种选择差异表达基因的新方法，其中使用选择结果的可重复性这一简单概念作为学习标准从数据中学习检验统计量。按照我们的定义，可重复性无需任何“真实情况”的知识即可计算，但利用微阵列数据的某些特性为真实数据生成分布下的预期损失提供渐近有效的指导。因此，我们能够间接最小化预期损失，并获得比传统方法更稳健得多的结果。我们将我们的方法应用于模拟数据和寡核苷酸阵列数据。

可用性

可向通讯作者索取。

相似文献

Data-adaptive test statistics for microarray data.

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii108-14. doi: 10.1093/bioinformatics/bti1119.

Large scale data mining approach for gene-specific standardization of microarray gene expression data.

Bioinformatics. 2006 Dec 1;22(23):2898-904. doi: 10.1093/bioinformatics/btl500. Epub 2006 Oct 10.

MDQC: a new quality assessment method for microarrays based on quality control reports.

Bioinformatics. 2007 Dec 1;23(23):3162-9. doi: 10.1093/bioinformatics/btm487. Epub 2007 Oct 12.

Variance stabilization and normalization for one-color microarray data using a data-driven multiscale approach.

Bioinformatics. 2006 Oct 15;22(20):2547-53. doi: 10.1093/bioinformatics/btl412. Epub 2006 Jul 28.

Classification based upon gene expression data: bias and precision of error rates.

Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28.

A new outlier removal approach for cDNA microarray normalization.

Biotechniques. 2009 Aug;47(2):691-2, 694-700. doi: 10.2144/000113195.

Fusing microarray experiments with multivariate regression.

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii137-43. doi: 10.1093/bioinformatics/bti1123.

Selection and validation of normalization methods for c-DNA microarrays using within-array replications.

Bioinformatics. 2007 Sep 15;23(18):2391-8. doi: 10.1093/bioinformatics/btm361. Epub 2007 Jul 27.

The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms.

Bioinformatics. 2006 Oct 15;22(20):2507-15. doi: 10.1093/bioinformatics/btl438. Epub 2006 Aug 14.

Sample size calculations based on ranking and selection in microarray experiments.

Biometrics. 2008 Mar;64(1):217-26. doi: 10.1111/j.1541-0420.2007.00875.x. Epub 2007 Aug 3.

引用本文的文献

Detection of deregulated pathways to lymphatic metastasis in oral squamous cell carcinoma.

Pathol Oncol Res. 2009 Jun;15(2):217-23. doi: 10.1007/s12253-008-9102-4. Epub 2008 Sep 18.

GEPAS, a web-based tool for microarray data analysis and interpretation.

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W308-14. doi: 10.1093/nar/gkn303. Epub 2008 May 28.

A unified framework for finding differentially expressed genes from microarray experiments.

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

Empirical study of supervised gene screening.

BMC Bioinformatics. 2006 Dec 18;7:537. doi: 10.1186/1471-2105-7-537.

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression.

BMC Bioinformatics. 2006 Aug 25;7:391. doi: 10.1186/1471-2105-7-391.

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.

BMC Bioinformatics. 2006 Jul 26;7:359. doi: 10.1186/1471-2105-7-359.

Next station in microarray data analysis: GEPAS.

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W486-91. doi: 10.1093/nar/gkl197.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于微阵列数据的数据自适应检验统计量。

Data-adaptive test statistics for microarray data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献