Matsui Shigeyuki, Ito Masaaki, Nishiyama Hiroyuki, Uno Hajime, Kotani Hirokazu, Watanabe Jun, Guilford Parry, Reeve Anthony, Fukushima Masanori, Ogawa Osamu
Department of Pharmacoepidemiology, Graduate School of Public Health, Kyoto University, Kyotom, Japan.
Bioinformatics. 2007 Mar 15;23(6):732-8. doi: 10.1093/bioinformatics/btl663. Epub 2007 Jan 18.
The development of gene expression microarray technology has allowed the identification of differentially expressed genes between different clinical phenotypic classes of cancer from a large pool of candidate genes. Although many class comparisons concerned only a single phenotype, simultaneous assessment of the relationship between gene expression and multiple phenotypes would be warranted to better understand the underlying biological structure.
We develop a method to select genes related to multiple clinical phenotypes based on a set of multivariate linear regression models. For each gene, we perform model selection based on the doubly-adjusted R-square statistic and use the maximum of this statistic for gene selection. The method can substantially improve the power in gene selection, compared with a conventional method that uses a single model exclusively for gene selection. Application to a bladder cancer study to correlate pre-treatment gene expressions with pathological stage and grade is given. The methods would be useful for screening for genes related to multiple clinical phenotypes.
SAS and MATLAB codes are available from author upon request.
基因表达微阵列技术的发展使得从大量候选基因中识别出不同临床表型类别的癌症之间差异表达的基因成为可能。尽管许多类别比较仅涉及单一表型,但为了更好地理解潜在的生物学结构,有必要同时评估基因表达与多种表型之间的关系。
我们基于一组多元线性回归模型开发了一种选择与多种临床表型相关基因的方法。对于每个基因,我们基于双调整R平方统计量进行模型选择,并使用该统计量的最大值进行基因选择。与仅使用单一模型进行基因选择的传统方法相比,该方法可以显著提高基因选择的功效。给出了其在膀胱癌研究中的应用,用于将治疗前基因表达与病理分期和分级相关联。这些方法将有助于筛选与多种临床表型相关的基因。
可根据作者要求提供SAS和MATLAB代码。