Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, Bellaterra, Spain.
PLoS One. 2012;7(6):e38919. doi: 10.1371/journal.pone.0038919. Epub 2012 Jun 11.
Gene expression data are influenced by multiple biological and technological factors leading to a wide range of dispersion scenarios, although skewed patterns are not commonly addressed in microarray analyses. In this study, the distribution pattern of several human transcriptomes has been studied on free-access microarray gene expression data. Our results showed that, even in previously normalized gene expression data, probe and differential expression within probe effects suffer from substantial departures from the commonly assumed symmetric gaussian distribution. We developed a flexible mixed model for non-competitive microarray data analysis that accounted for asymmetric and heavy-tailed (Student's t distribution) dispersion processes. Random effects for gene expression data were modeled under asymmetric Student's t distributions where the asymmetry parameter (λ) took values from perfect symmetry (λ = 0) to right- (λ>0) or left-side (λ>0) over-expression patterns. This approach was applied to four free-access human data sets and revealed clearly better model performance when comparing with standard approaches accounting for traditional symmetric gaussian distribution patterns. Our analyses on human gene expression data revealed a substantial degree of right-hand asymmetry for probe effects, whereas differential gene expression addressed both symmetric and left-hand asymmetric patterns. Although these results cannot be extrapolated to all microarray experiments, they highlighted the incidence of skew dispersion patterns in human transcriptome; moreover, we provided a new analytical approach to appropriately address this biological phenomenon. The source code of the program accommodating these analytical developments and additional information about practical aspects on running the program are freely available by request to the corresponding author of this article.
基因表达数据受到多种生物和技术因素的影响,导致分布情况多种多样,尽管在微阵列分析中通常不考虑偏态模式。在这项研究中,我们研究了几个人类转录组在免费获取的微阵列基因表达数据中的分布模式。我们的结果表明,即使在先前已经标准化的基因表达数据中,探针和探针内的差异表达也受到严重偏离常见对称高斯分布的影响。我们开发了一种灵活的混合模型,用于非竞争的微阵列数据分析,该模型考虑了不对称和重尾(学生 t 分布)的分布过程。基因表达数据的随机效应在不对称的学生 t 分布下建模,其中不对称参数(λ)取值从完美对称(λ=0)到右(λ>0)或左(λ>0)过表达模式。我们将这种方法应用于四个免费获取的人类数据集,结果表明,与传统的对称高斯分布模式相比,该方法的模型性能明显更好。我们对人类基因表达数据的分析表明,探针效应存在相当大的右偏不对称程度,而差异基因表达则存在对称和左偏不对称模式。虽然这些结果不能推广到所有的微阵列实验,但它们强调了人类转录组中存在偏态分布模式的情况;此外,我们提供了一种新的分析方法来适当地解决这种生物学现象。程序的源代码可以根据需要向本文的通讯作者索取,该程序还包含了关于运行程序的实际方面的更多信息。