Qiu Peng, Wang Z Jane, Liu K J Ray
Department of Electrical and Computer Engineering, University of Maryland College Park, MD 20742, USA.
Bioinformatics. 2005 Jul 15;21(14):3114-21. doi: 10.1093/bioinformatics/bti483. Epub 2005 May 6.
DNA microarray technologies make it possible to simultaneously monitor thousands of genes' expression levels. A topic of great interest is to study the different expression profiles between microarray samples from cancer patients and normal subjects, by classifying them at gene expression levels. Currently, various clustering methods have been proposed in the literature to classify cancer and normal samples based on microarray data, and they are predominantly data-driven approaches. In this paper, we propose an alternative approach, a model-driven approach, which can reveal the relationship between the global gene expression profile and the subject's health status, and thus is promising in predicting the early development of cancer.
In this work, we propose an ensemble dependence model, aimed at exploring the group dependence relationship of gene clusters. Under the framework of hypothesis-testing, we employ genes' dependence relationship as a feature to model and classify cancer and normal samples. The proposed classification scheme is applied to several real cancer datasets, including cDNA, Affymetrix microarray and proteomic data. It is noted that the proposed method yields very promising performance. We further investigate the eigenvalue pattern of the proposed method, and we discover different patterns between cancer and normal samples. Moreover, the transition between cancer and normal patterns suggests that the eigenvalue pattern of the proposed models may have potential to predict the early stage of cancer development. In addition, we examine the effects of possible model mismatch on the proposed scheme.
DNA微阵列技术使同时监测数千个基因的表达水平成为可能。一个备受关注的课题是通过在基因表达水平上对癌症患者和正常受试者的微阵列样本进行分类,来研究它们不同的表达谱。目前,文献中已经提出了各种聚类方法,用于基于微阵列数据对癌症样本和正常样本进行分类,并且它们主要是数据驱动的方法。在本文中,我们提出了一种替代方法,即模型驱动方法,该方法可以揭示全局基因表达谱与受试者健康状况之间的关系,因此在预测癌症的早期发展方面很有前景。
在这项工作中,我们提出了一种集成依赖模型,旨在探索基因簇的组间依赖关系。在假设检验的框架下,我们将基因的依赖关系作为一个特征来对癌症样本和正常样本进行建模和分类。所提出的分类方案应用于几个真实的癌症数据集,包括cDNA、Affymetrix微阵列和蛋白质组学数据。值得注意的是,所提出的方法产生了非常有前景的性能。我们进一步研究了所提出方法的特征值模式,并且我们发现了癌症样本和正常样本之间的不同模式。此外,癌症模式和正常模式之间的转变表明,所提出模型的特征值模式可能具有预测癌症发展早期阶段的潜力。此外,我们研究了可能的模型不匹配对所提出方案的影响。