Bio IE Lab, Industrial Engineering Department, University of Puerto Rico at Mayaguez, Mayagüez, Puerto Rico.
Cancer Med. 2013 Apr;2(2):253-65. doi: 10.1002/cam4.69. Epub 2013 Feb 27.
Microarray experiments are capable of determining the relative expression of tens of thousands of genes simultaneously, thus resulting in very large databases. The analysis of these databases and the extraction of biologically relevant knowledge from them are challenging tasks. The identification of potential cancer biomarker genes is one of the most important aims for microarray analysis and, as such, has been widely targeted in the literature. However, identifying a set of these genes consistently across different experiments, researches, microarray platforms, or cancer types is still an elusive endeavor. Besides the inherent difficulty of the large and nonconstant variability in these experiments and the incommensurability between different microarray technologies, there is the issue of the users having to adjust a series of parameters that significantly affect the outcome of the analyses and that do not have a biological or medical meaning. In this study, the identification of potential cancer biomarkers from microarray data is casted as a multiple criteria optimization (MCO) problem. The efficient solutions to this problem, found here through data envelopment analysis (DEA), are associated to genes that are proposed as potential cancer biomarkers. The method does not require any parameter adjustment by the user, and thus fosters repeatability. The approach also allows the analysis of different microarray experiments, microarray platforms, and cancer types simultaneously. The results include the analysis of three publicly available microarray databases related to cervix cancer. This study points to the feasibility of modeling the selection of potential cancer biomarkers from microarray data as an MCO problem and solve it using DEA. Using MCO entails a new optic to the identification of potential cancer biomarkers as it does not require the definition of a threshold value to establish significance for a particular gene and the selection of a normalization procedure to compare different experiments is no longer necessary.
微阵列实验能够同时确定数万个基因的相对表达水平,从而产生非常庞大的数据库。分析这些数据库并从中提取有生物学意义的知识是具有挑战性的任务。识别潜在的癌症生物标志物基因是微阵列分析的最重要目标之一,因此在文献中得到了广泛的关注。然而,在不同的实验、研究、微阵列平台或癌症类型中一致地识别一组这样的基因仍然是一项难以捉摸的工作。除了这些实验中固有的巨大且非恒定的可变性以及不同微阵列技术之间的不可通约性之外,用户还必须调整一系列参数,这些参数会显著影响分析结果,而且这些参数没有生物学或医学意义。在本研究中,从微阵列数据中识别潜在的癌症生物标志物被视为多准则优化(MCO)问题。通过数据包络分析(DEA)在这里找到的这个问题的有效解决方案与被提议为潜在癌症生物标志物的基因相关联。该方法不需要用户进行任何参数调整,从而提高了可重复性。该方法还允许同时分析不同的微阵列实验、微阵列平台和癌症类型。研究结果包括对三个与宫颈癌相关的公共微阵列数据库的分析。本研究表明,将微阵列数据中潜在癌症生物标志物的选择建模为 MCO 问题并用 DEA 解决是可行的。使用 MCO 需要从新的角度来识别潜在的癌症生物标志物,因为它不需要定义阈值来确定特定基因的显著性,也不需要选择标准化程序来比较不同的实验。