Khoshhali Mehri, Mahjub Hossein, Saidijam Massoud, Poorolajal Jalal, Soltanian Ali Reza
Department of Biostatistics & Epidemiology, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
J Mol Genet Med. 2012;6:287-92. doi: 10.4172/1747-0862.1000051. Epub 2012 May 23.
The present study was conducted to predict survival time in patients with diffuse large B-cell lymphoma, DLBCL, based on microarray data using Cox regression model combined with seven dimension reduction methods. This historical cohort included 2042 gene expression measurements from 40 patients with DLBCL. In order to predict survival, a combination of Cox regression model was used with seven methods for dimension reduction or shrinkage including univariate selection, forward stepwise selection, principal component regression, supervised principal component regression, partial least squares regression, ridge regression and Losso. The capacity of predictions was examined by three different criteria including log rank test, prognostic index and deviance. MATLAB r2008a and RKWard software were used for data analysis. Based on our findings, performance of ridge regression was better than other methods. Based on ridge regression coefficients and a given cut point value, 16 genes were selected. By using forward stepwise selection method in Cox regression model, it was indicated that the expression of genes GENE3555X and GENE3807X decreased the survival time (P=0.008 and P=0.003, respectively), whereas the genes GENE3228X and GENE1551X increased survival time (P=0.002 and P<0.001, respectively). This study indicated that ridge regression method had higher capacity than other dimension reduction methods for the prediction of survival time in patients with DLBCL. Furthermore, a combination of statistical methods and microarray data could help to detect influential genes in survival.
本研究旨在基于微阵列数据,使用Cox回归模型结合七种降维方法,预测弥漫性大B细胞淋巴瘤(DLBCL)患者的生存时间。这个历史队列包括来自40例DLBCL患者的2042个基因表达测量值。为了预测生存情况,将Cox回归模型与七种降维或收缩方法结合使用,包括单变量选择、向前逐步选择、主成分回归、监督主成分回归、偏最小二乘回归、岭回归和最小绝对收缩和选择算子(Lasso)。通过对数秩检验、预后指数和偏差这三种不同标准来检验预测能力。使用MATLAB r2008a和RKWard软件进行数据分析。基于我们的研究结果,岭回归的性能优于其他方法。基于岭回归系数和给定的切点值,选择了16个基因。通过在Cox回归模型中使用向前逐步选择方法,结果表明基因GENE3555X和GENE3807X的表达降低了生存时间(P分别为0.008和0.003),而基因GENE3228X和GENE1551X增加了生存时间(P分别为0.002和P<0.001)。本研究表明,在预测DLBCL患者的生存时间方面,岭回归方法比其他降维方法具有更高的能力。此外,统计方法和微阵列数据的结合有助于检测影响生存的基因。