Suppr超能文献

DNA 微阵列基因表达数据分布及相关矩的验证和特征描述。

Validation and characterization of DNA microarray gene expression data distribution and associated moments.

机构信息

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA.

出版信息

BMC Bioinformatics. 2010 Nov 24;11:576. doi: 10.1186/1471-2105-11-576.

Abstract

BACKGROUND

The data from DNA microarrays are increasingly being used in order to understand effects of different conditions, exposures or diseases on the modulation of the expression of various genes in a biological system. This knowledge is then further used in order to generate molecular mechanistic hypotheses for an organism when it is exposed to different conditions. Several different methods have been proposed to analyze these data under different distributional assumptions on gene expression. However, the empirical validation of these assumptions is lacking.

RESULTS

Best fit hypotheses tests, moment-ratio diagrams and relationships between the different moments of the distribution of the gene expression was used to characterize the observed distributions. The data are obtained from the publicly available gene expression database, Gene Expression Omnibus (GEO) to characterize the empirical distributions of gene expressions obtained under varying experimental situations each of which providing relatively large number of samples for hypothesis testing. All data were obtained from either of two microarray platforms--the commercial Affymetrix mouse 430.2 platform and a non-commercial Rosetta/Merck one. The data from each platform were preprocessed in the same manner.

CONCLUSIONS

The null hypotheses for goodness of fit for all considered univariate theoretical probability distributions (including the Normal distribution) are rejected for more than 50% of probe sets on the Affymetrix microarray platform at a 95% confidence level, suggesting that under the tested conditions a priori assumption of any of these distributions across all probe sets is not valid. The pattern of null hypotheses rejection was different for the data from Rosetta/Merck platform with only around 20% of the probe sets failing the logistic distribution goodness-of-fit test. We find that there are statistically significant (at 95% confidence level based on the F-test for the fitted linear model) relationships between the mean and the logarithm of the coefficient of variation of the distributions of the logarithm of gene expressions. An additional novel statistically significant quadratic relationship between the skewness and kurtosis is identified. Data from both microarray platforms fail to identify with any one of the chosen theoretical probability distributions from an analysis of the l-moment ratio diagram.

摘要

背景

越来越多地使用 DNA 微阵列数据来了解不同条件、暴露或疾病对生物系统中各种基因表达调控的影响。然后,将这些知识进一步用于生成生物体在暴露于不同条件时的分子机制假说。已经提出了几种不同的方法来根据基因表达的不同分布假设来分析这些数据。然而,缺乏对这些假设的经验验证。

结果

使用最佳拟合假设检验、矩比图以及基因表达分布的不同矩之间的关系来描述观察到的分布。数据来自公开的基因表达数据库 Gene Expression Omnibus (GEO),以描述在不同实验条件下获得的基因表达的经验分布,每种条件都为假设检验提供了相对大量的样本。所有数据均来自两种微阵列平台之一——商业 Affymetrix 小鼠 430.2 平台和非商业 Rosetta/Merck 平台。每个平台的数据都以相同的方式进行预处理。

结论

在 95%置信水平下,超过 50%的 Affymetrix 微阵列平台上的探针集拒绝了所有考虑的单变量理论概率分布(包括正态分布)的拟合优度的零假设,表明在测试条件下,所有探针集的这些分布的先验假设是无效的。Rosetta/Merck 平台数据的零假设拒绝模式则不同,只有约 20%的探针集未能通过逻辑分布拟合优度检验。我们发现,分布的对数的均值和对数变异系数之间存在统计学上显著的关系(基于拟合线性模型的 F 检验,置信水平为 95%)。还确定了偏度和峰度之间的另一个新的统计学上显著的二次关系。来自两个微阵列平台的数据在对数矩比图分析中无法与所选理论概率分布中的任何一个相匹配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dda/3002903/460b55976354/1471-2105-11-576-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验