Suppr超能文献

在微阵列数据分析中将不同基因的表达水平视为一个样本:这值得冒险吗?

Treating expression levels of different genes as a sample in microarray data analysis: is it worth a risk?

作者信息

Klebanov Lev, Yakovlev Andrei

出版信息

Stat Appl Genet Mol Biol. 2006;5:Article9. doi: 10.2202/1544-6115.1185. Epub 2006 Mar 24.

Abstract

One of the prevailing ideas in the literature on microarray data analysis is to pool the expression measures across genes and treat them as a sample drawn from some distribution. Several universal laws were proposed to analytically describe this distribution. This idea raises a number of concerns. The expression levels of genes are not identically distributed random variables so that treating them as a sample amounts to sampling from a mixture of equally weighted distributions, each being associated with a different gene. The expression levels of different genes are heavily dependent random variables so that the law of large numbers and statistical goodness-of-fit tests are normally inapplicable to this kind of data. This dependence represents a very serious pitfall in microarray data analysis.

摘要

微阵列数据分析文献中一个普遍的观点是,将基因间的表达量进行汇总,并将它们视为从某种分布中抽取的一个样本。人们提出了若干通用法则来对这种分布进行分析性描述。这一观点引发了诸多问题。基因的表达水平并非独立同分布的随机变量,因此将它们视为一个样本相当于从等权重分布的混合体中进行抽样,每个分布都与一个不同的基因相关联。不同基因的表达水平是高度相关的随机变量,所以大数定律和统计拟合优度检验通常不适用于这类数据。这种相关性在微阵列数据分析中是一个非常严重的缺陷。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验