Huang Bin, Zhong Ning, Cao Hongbao, Yu Guiping
Department of Cardiothoracic Surgery, Affiliated Jiangyin Hospital of Southeast University, Jiangyin, Jiangsu 214400, P.R. China.
Department of Cardiothoracic Surgery, The First People's Hospital of Kunshan, Kunshan, Jiangsu 215300, P.R. China.
Oncol Lett. 2018 Oct;16(4):5140-5146. doi: 10.3892/ol.2018.9241. Epub 2018 Jul 31.
There have been hundreds of genes demonstrated to be associated with lung squamous cell carcinoma (LSCC), presenting various degrees of association with this disease. In the present study, gene vectors were investigated as genetic biomarkers for the diagnosis and personalized treatment of LSCC. A LSCC genetic database (LSCC_GD) was developed through literature-associated data analysis, where 260 LSCC target genes were curated. Subsequently, numerous associations between these genes and LSCC were studied. Following this, a sparse representation-based variable selection (SRVS) was employed for gene selection from two LSCC gene expression datasets, followed by a case/control classification. Results were compared using analysis of variance (ANOVA)-based gene selection approaches. Using SRVS, a gene vector was selected from each dataset, resulting in significantly higher classification accuracy (CR), compared with randomly selected genes (For datasets GSE18842 and GSE1987, CR=100 and 100% and permutation P=5.0×10 and 1.8×10, respectively). The SRVS method outperformed ANOVA in terms of the classification ratio. The results indicated that, for a given dataset, there may be a gene vector from the 260 curated LSCC genes that possesses significant prediction power. SRVS is effective in identifying the optimum gene subset target for personalized treatment.
已有数百个基因被证明与肺鳞状细胞癌(LSCC)相关,与该疾病呈现出不同程度的关联。在本研究中,对基因载体作为LSCC诊断和个性化治疗的遗传生物标志物进行了研究。通过文献关联数据分析建立了一个LSCC遗传数据库(LSCC_GD),从中筛选出260个LSCC靶基因。随后,研究了这些基因与LSCC之间的众多关联。在此之后,采用基于稀疏表示的变量选择(SRVS)从两个LSCC基因表达数据集中进行基因选择,随后进行病例/对照分类。使用基于方差分析(ANOVA)的基因选择方法对结果进行比较。使用SRVS从每个数据集中选择了一个基因载体,与随机选择的基因相比,分类准确率(CR)显著更高(对于数据集GSE18842和GSE1987,CR分别为100%和100%,置换P分别为5.0×10和1.8×10)。在分类率方面,SRVS方法优于ANOVA。结果表明,对于给定的数据集,在260个精心筛选的LSCC基因中可能存在一个具有显著预测能力的基因载体。SRVS在识别个性化治疗的最佳基因子集靶点方面是有效的。