在微阵列数据分析中将不同基因的表达水平视为一个样本：这值得冒险吗？

Treating expression levels of different genes as a sample in microarray data analysis: is it worth a risk?

作者信息

Klebanov Lev, Yakovlev Andrei

出版信息

Stat Appl Genet Mol Biol. 2006;5:Article9. doi: 10.2202/1544-6115.1185. Epub 2006 Mar 24.

Abstract

One of the prevailing ideas in the literature on microarray data analysis is to pool the expression measures across genes and treat them as a sample drawn from some distribution. Several universal laws were proposed to analytically describe this distribution. This idea raises a number of concerns. The expression levels of genes are not identically distributed random variables so that treating them as a sample amounts to sampling from a mixture of equally weighted distributions, each being associated with a different gene. The expression levels of different genes are heavily dependent random variables so that the law of large numbers and statistical goodness-of-fit tests are normally inapplicable to this kind of data. This dependence represents a very serious pitfall in microarray data analysis.

摘要

微阵列数据分析文献中一个普遍的观点是，将基因间的表达量进行汇总，并将它们视为从某种分布中抽取的一个样本。人们提出了若干通用法则来对这种分布进行分析性描述。这一观点引发了诸多问题。基因的表达水平并非独立同分布的随机变量，因此将它们视为一个样本相当于从等权重分布的混合体中进行抽样，每个分布都与一个不同的基因相关联。不同基因的表达水平是高度相关的随机变量，所以大数定律和统计拟合优度检验通常不适用于这类数据。这种相关性在微阵列数据分析中是一个非常严重的缺陷。

相似文献

Treating expression levels of different genes as a sample in microarray data analysis: is it worth a risk?在微阵列数据分析中将不同基因的表达水平视为一个样本：这值得冒险吗？

Stat Appl Genet Mol Biol. 2006;5:Article9. doi: 10.2202/1544-6115.1185. Epub 2006 Mar 24.

Stochastic dynamic modeling of short gene expression time-series data.短基因表达时间序列数据的随机动态建模

IEEE Trans Nanobioscience. 2008 Mar;7(1):44-55. doi: 10.1109/TNB.2008.2000149.

A new type of stochastic dependence revealed in gene expression data.基因表达数据中揭示的一种新型随机依赖性。

Stat Appl Genet Mol Biol. 2006;5:Article7. doi: 10.2202/1544-6115.1189. Epub 2006 Mar 6.

Identification of differential gene expression for microarray data using recursive random forest.使用递归随机森林识别微阵列数据中的差异基因表达

Chin Med J (Engl). 2008 Dec 20;121(24):2492-6.

Techniques for clustering gene expression data.基因表达数据聚类技术。

Comput Biol Med. 2008 Mar;38(3):283-93. doi: 10.1016/j.compbiomed.2007.11.001. Epub 2007 Dec 3.

A mixture model with random-effects components for clustering correlated gene-expression profiles.一种具有随机效应成分的混合模型，用于对相关基因表达谱进行聚类。

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

Pre-processing of microarray data and analysis of differential expression.微阵列数据的预处理及差异表达分析。

Methods Mol Biol. 2008;452:89-110. doi: 10.1007/978-1-60327-159-2_4.

Microarray data analysis for differential expression: a tutorial.用于差异表达的微阵列数据分析：教程

P R Health Sci J. 2009 Jun;28(2):89-104.

Sample size calculations based on ranking and selection in microarray experiments.基于微阵列实验中排序与选择的样本量计算。

Biometrics. 2008 Mar;64(1):217-26. doi: 10.1111/j.1541-0420.2007.00875.x. Epub 2007 Aug 3.

A new efficient statistical test for detecting variability in the gene expression data.一种用于检测基因表达数据变异性的新型高效统计检验方法。

Stat Methods Med Res. 2008 Aug;17(4):405-19. doi: 10.1177/0962280206078643. Epub 2007 Aug 14.

引用本文的文献

Leveraging Big Data to Transform Drug Discovery.利用大数据变革药物研发。

Methods Mol Biol. 2019;1939:91-118. doi: 10.1007/978-1-4939-9089-4_6.

Characterization of a genomic signature of pregnancy identified in the breast.在乳腺中鉴定出妊娠的基因组特征。

Cancer Prev Res (Phila). 2011 Sep;4(9):1457-64. doi: 10.1158/1940-6207.CAPR-11-0021. Epub 2011 May 27.

Balancing Type One and Two Errors in Multiple Testing for Differential Expression of Genes.基因差异表达多重检验中一类错误与二类错误的平衡

Comput Stat Data Anal. 2009 Mar 15;53(5):1622-1629. doi: 10.1016/j.csda.2008.04.010.

A Distribution-Free Convolution Model for background correction of oligonucleotide microarray data.一种用于寡核苷酸微阵列数据背景校正的无分布卷积模型。

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2164-10-S1-S19.

Weighted analysis of general microarray experiments.通用微阵列实验的加权分析

BMC Bioinformatics. 2007 Oct 15;8:387. doi: 10.1186/1471-2105-8-387.

Capturing heterogeneity in gene expression studies by surrogate variable analysis.通过替代变量分析在基因表达研究中捕捉异质性。

PLoS Genet. 2007 Sep;3(9):1724-35. doi: 10.1371/journal.pgen.0030161. Epub 2007 Aug 1.

False discovery rate paradigms for statistical analyses of microarray gene expression data.用于微阵列基因表达数据统计分析的错误发现率范式。

Bioinformation. 2007 Apr 10;1(10):436-46. doi: 10.6026/97320630001436.

Is there an alternative to increasing the sample size in microarray studies?在微阵列研究中，是否有增加样本量的替代方法？

Bioinformation. 2007 Apr 10;1(10):429-31. doi: 10.6026/97320630001429.

How high is the level of technical noise in microarray data?微阵列数据中的技术噪声水平有多高？

Biol Direct. 2007 Apr 11;2:9. doi: 10.1186/1745-6150-2-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在微阵列数据分析中将不同基因的表达水平视为一个样本：这值得冒险吗？

Treating expression levels of different genes as a sample in microarray data analysis: is it worth a risk?

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献