Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, DK-8000 Århus C, Denmark.
BMC Bioinformatics. 2011 May 27;12:215. doi: 10.1186/1471-2105-12-215.
Patterns of genome-wide methylation vary between tissue types. For example, cancer tissue shows markedly different patterns from those of normal tissue. In this paper we propose a beta-mixture model to describe genome-wide methylation patterns based on probe data from methylation microarrays. The model takes dependencies between neighbour probe pairs into account and assumes three broad categories of methylation, low, medium and high. The model is described by 37 parameters, which reduces the dimensionality of a typical methylation microarray significantly. We used methylation microarray data from 42 colon cancer samples to assess the model.
Based on data from colon cancer samples we show that our model captures genome-wide characteristics of methylation patterns. We estimate the parameters of the model and show that they vary between different tissue types. Further, for each methylation probe the posterior probability of a methylation state (low, medium or high) is calculated and the probability that the state is correctly predicted is assessed. We demonstrate that the model can be applied to classify cancer tissue types accurately and that the model provides accessible and easily interpretable data summaries.
We have developed a beta-mixture model for methylation microarray data. The model substantially reduces the dimensionality of the data. It can be used for further analysis, such as sample classification or to detect changes in methylation status between different samples and tissues.
全基因组甲基化模式在组织类型之间存在差异。例如,癌症组织与正常组织的模式明显不同。在本文中,我们提出了一种基于甲基化微阵列探针数据的β混合模型来描述全基因组甲基化模式。该模型考虑了相邻探针对之间的依赖性,并假设了低、中、高三种广泛的甲基化类别。该模型由 37 个参数描述,大大降低了典型甲基化微阵列的维数。我们使用来自 42 个结肠癌样本的甲基化微阵列数据来评估该模型。
基于结肠癌样本的数据,我们表明我们的模型捕捉到了甲基化模式的全基因组特征。我们估计了模型的参数,并表明它们在不同的组织类型之间存在差异。此外,对于每个甲基化探针,计算了其甲基化状态(低、中或高)的后验概率,并评估了状态正确预测的概率。我们证明该模型可以准确地应用于分类癌症组织类型,并且该模型提供了易于访问和解释的数据摘要。
我们开发了一种用于甲基化微阵列数据的β混合模型。该模型大大降低了数据的维度。它可用于进一步的分析,如样本分类,或检测不同样本和组织之间的甲基化状态变化。