微阵列——识别具有不同Gleason分级模式的前列腺肿瘤的分子图谱。
Microarrays--identifying molecular portraits for prostate tumors with different Gleason patterns.
作者信息
Mendes Alexandre, Scott Rodney J, Moscato Pablo
机构信息
Newcastle Bioinformatics Initiative, University of Newcastle, New South Wales, Australia.
出版信息
Methods Mol Med. 2008;141:131-51. doi: 10.1007/978-1-60327-148-6_8.
We present in this chapter the combined use of several recently introduced methodologies for the analysis of microarray datasets. These computational techniques are varied in type and very powerful when combined. We have selected a prostate cancer dataset which is available in the public domain to allow for further comparisons with existing methods. The task is to identify biomarkers that correlate with the clinical phenotype of interest, i.e., Gleason patterns 3, 4, and 5. A supervised method, based on the mathematical formalism of (alpha, beta)-k-feature sets (1), is used to select differentially expressed genes. After these "molecular signatures" are identified, we applied an unsupervised method (a memetic algorithm) to order the samples (2). The objective is to maximize a global measure of correlation in the two-dimensional display of gene expression profiles. With the resulting ordering and taxonomy we are able to identify samples that have been assigned a certain Gleason pattern, and have gene expression patterns different from most of the other samples in the group. We reiterate the approach to obtain molecular signatures that produce coherent patterns of gene expression in each of the three Gleason pattern groups, and we analyze the statistically significant patterns of gene expression that seem to be implicated in these different stages of disease.
在本章中,我们介绍了几种最近引入的用于分析微阵列数据集的方法的联合使用。这些计算技术类型多样,联合使用时非常强大。我们选择了一个可在公共领域获取的前列腺癌数据集,以便与现有方法进行进一步比较。任务是识别与感兴趣的临床表型(即 Gleason 模式 3、4 和 5)相关的生物标志物。一种基于(α,β)-k 特征集数学形式主义的监督方法(1)用于选择差异表达基因。在识别出这些“分子特征”后,我们应用一种无监督方法(一种混合算法)对样本进行排序(2)。目的是在基因表达谱的二维展示中最大化相关性的全局度量。通过得到的排序和分类法,我们能够识别被指定为特定 Gleason 模式且基因表达模式与该组中大多数其他样本不同的样本。我们重申获取在三个 Gleason 模式组中的每一组中产生连贯基因表达模式的分子特征的方法,并分析似乎与疾病这些不同阶段相关的具有统计学意义的基因表达模式。