Suppr超能文献

在微阵列数据分析中将不同基因的表达水平视为一个样本:这值得冒险吗?

Treating expression levels of different genes as a sample in microarray data analysis: is it worth a risk?

作者信息

Klebanov Lev, Yakovlev Andrei

出版信息

Stat Appl Genet Mol Biol. 2006;5:Article9. doi: 10.2202/1544-6115.1185. Epub 2006 Mar 24.

Abstract

One of the prevailing ideas in the literature on microarray data analysis is to pool the expression measures across genes and treat them as a sample drawn from some distribution. Several universal laws were proposed to analytically describe this distribution. This idea raises a number of concerns. The expression levels of genes are not identically distributed random variables so that treating them as a sample amounts to sampling from a mixture of equally weighted distributions, each being associated with a different gene. The expression levels of different genes are heavily dependent random variables so that the law of large numbers and statistical goodness-of-fit tests are normally inapplicable to this kind of data. This dependence represents a very serious pitfall in microarray data analysis.

摘要

微阵列数据分析文献中一个普遍的观点是,将基因间的表达量进行汇总,并将它们视为从某种分布中抽取的一个样本。人们提出了若干通用法则来对这种分布进行分析性描述。这一观点引发了诸多问题。基因的表达水平并非独立同分布的随机变量,因此将它们视为一个样本相当于从等权重分布的混合体中进行抽样,每个分布都与一个不同的基因相关联。不同基因的表达水平是高度相关的随机变量,所以大数定律和统计拟合优度检验通常不适用于这类数据。这种相关性在微阵列数据分析中是一个非常严重的缺陷。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验