Zhao Yingdong, Simon Richard
Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA. Email:
Cancer Inform. 2010 May 7;9:105-14. doi: 10.4137/cin.s3805.
There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR), the least absolute shrinkage and selection operator (LASSO), and the averaged linear regression method (ALM). All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.
使用线性回归模型基于微阵列表达谱预测连续反应的出版物相对较少。当预测变量的数量超过样本数量时,标准线性回归方法存在问题。我们评估了三种可用于基于高维基因表达数据预测连续反应的线性回归算法。这三种算法分别是最小角回归(LAR)、最小绝对收缩和选择算子(LASSO)以及平均线性回归方法(ALM)。所有方法均基于一个真实基因表达数据集进行模拟测试,并对两组真实基因表达数据进行分析,采用无偏完全交叉验证方法。我们的结果表明,LASSO算法通常提供的模型预测误差比LAR方法略低,但两者的效率均高于ALM预测器。我们为BRB - ArrayTools开发了一个插件,该插件实现了具有完全交叉验证的LAR和LASSO算法。