Alexe G, Dalgin G S, Ganesan S, Delisi C, Bhanot G
The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA.
J Biosci. 2007 Aug;32(5):1027-39. doi: 10.1007/s12038-007-0102-4.
We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble k-clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal,Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.
我们开发了一种新技术来分析微阵列数据,该技术结合了主成分分析和一致性集成k聚类,以在数据中找到稳健的聚类和基因标记。我们将我们的方法应用于一个公开的微阵列乳腺癌数据集,该数据集具有正常样本以及疾病三个病理阶段(即非典型导管增生或ADH、导管原位癌或DCIS以及浸润性导管癌或IDC)中基因的表达水平。我们的方法对聚类技术和数据扰动进行平均,以找到稳定、稳健的聚类和基因标记。我们用乳腺癌的不同亚型(管腔型、基底型和Her2+)来识别聚类及其通路。我们证实癌症表型在早期(早期增生或ADH阶段)就已出现,并且从我们的分析中发现,每个亚型都沿着其自身特定的途径从ADH发展到DCIS再到IDC,就好像每种都是一种独特的疾病。