Nicolau Monica, Tibshirani Robert, Børresen-Dale Anne-Lise, Jeffrey Stefanie S
Department of Surgery, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
Bioinformatics. 2007 Apr 15;23(8):957-65. doi: 10.1093/bioinformatics/btm033. Epub 2007 Feb 3.
Genomic high-throughput technology generates massive data, providing opportunities to understand countless facets of the functioning genome. It also raises profound issues in identifying data relevant to the biology being studied.
We introduce a method for the analysis of pathologic biology that unravels the disease characteristics of high dimensional data. The method, disease-specific genomic analysis (DSGA), is intended to precede standard techniques like clustering or class prediction, and enhance their performance and ability to detect disease. DSGA measures the extent to which the disease deviates from a continuous range of normal phenotypes, and isolates the aberrant component of data. In several microarray cancer datasets, we show that DSGA outperforms standard methods. We then use DSGA to highlight a novel subdivision of an important class of genes in breast cancer, the estrogen receptor (ER) cluster. We also identify new markers distinguishing ductal and lobular breast cancers. Although our examples focus on microarrays, DSGA generalizes to any high dimensional genomic/proteomic data.
基因组高通量技术产生了海量数据,为了解功能基因组的无数方面提供了机会。它在识别与所研究生物学相关的数据方面也引发了深刻的问题。
我们介绍了一种用于病理生物学分析的方法,该方法能够揭示高维数据的疾病特征。这种方法,即疾病特异性基因组分析(DSGA),旨在先于聚类或类别预测等标准技术,并提高它们检测疾病的性能和能力。DSGA测量疾病偏离连续正常表型范围的程度,并分离出数据的异常成分。在几个微阵列癌症数据集中,我们表明DSGA优于标准方法。然后我们使用DSGA突出了乳腺癌中一类重要基因——雌激素受体(ER)簇的一个新的细分。我们还识别出区分导管癌和小叶癌的新标志物。尽管我们的例子侧重于微阵列,但DSGA可推广到任何高维基因组/蛋白质组数据。