Zhang Lei, Wang Linlin, Du Bochuan, Wang Tianjiao, Tian Pu, Tian Suyan
School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China; Department of Neurology, The Second Hospital of Jilin University, 218 Ziqiang Street, Changchun, Jilin 130041, China.
School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China.
Biomed Res Int. 2016;2016:2491671. doi: 10.1155/2016/2491671. Epub 2016 Jun 30.
Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.
在非小细胞肺癌(NSCLC)中,腺癌(AC)和鳞状细胞癌(SCC)是两种主要的组织学亚型,分别占所有肺癌病例的约40%和30%。由于AC和SCC在起源细胞、肺内位置和生长模式上存在差异,它们被视为不同的疾病。基因表达特征已被证明是区分AC和SCC的有效工具。基因集分析被认为与基因表达特征的识别无关。然而,我们发现一种特定的基因集分析方法,即微阵列基因集缩减的显著性分析(SAMGSR),可以直接用于选择相关特征并构建基因表达特征。在本研究中,我们将SAMGSR应用于一个NSCLC基因表达数据集。与几种新型特征选择算法(例如LASSO)相比,SAMGSR在预测能力和模型简约性方面具有同等或更好的性能。因此,SAMGSR确实是一种特征选择算法。此外,我们将SAMGSR分别应用于AC和SCC亚型以区分它们各自的阶段,即II期与I期。这两个所得基因特征之间几乎没有重叠,说明AC和SCC在技术上是不同的疾病。因此,在构建这两种NSCLC亚型的诊断或预后特征时,建议对亚型进行分层分析。