College of Computer Science, Sichuan University, Chengdu, 610064, Sichuan, China.
College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China.
BMC Bioinformatics. 2020 May 19;21(1):195. doi: 10.1186/s12859-020-03544-z.
The aim of gene expression-based clinical modelling in tumorigenesis is not only to accurately predict the clinical endpoints, but also to reveal the genome characteristics for downstream analysis for the purpose of understanding the mechanisms of cancers. Most of the conventional machine learning methods involved a gene filtering step, in which tens of thousands of genes were firstly filtered based on the gene expression levels by a statistical method with an arbitrary cutoff. Although gene filtering procedure helps to reduce the feature dimension and avoid overfitting, there is a risk that some pathogenic genes important to the disease will be ignored.
In this study, we proposed a novel deep learning approach by combining a convolutional neural network with stationary wavelet transform (SWT-CNN) for stratifying cancer patients and predicting their clinical outcomes without gene filtering based on tumor genomic profiles. The proposed SWT-CNN overperformed the state-of-art algorithms, including support vector machine (SVM) and logistic regression (LR), and produced comparable prediction performance to random forest (RF). Furthermore, for all the cancer types, we firstly proposed a method to weight the genes with the scores, which took advantage of the representative features in the hidden layer of convolutional neural network, and then selected the prognostic genes for the Cox proportional-hazards regression. The results showed that risk stratifications can be effectively improved by using the identified prognostic genes as feature, indicating that the representative features generated by SWT-CNN can well correlate the genes with prognostic risk in cancers and be helpful for selecting the prognostic gene signatures.
Our results indicated that gene expression-based SWT-CNN model can be an excellent tool for stratifying the prognostic risk for cancer patients. In addition, the representative features of SWT-CNN were validated to be useful for evaluating the importance of the genes in the risk stratification and can be further used to identify the prognostic gene signatures.
肿瘤发生中基于基因表达的临床建模的目的不仅是准确预测临床终点,而且还揭示基因组特征,以便于下游分析,从而了解癌症的机制。大多数传统的机器学习方法都涉及基因过滤步骤,其中数以万计的基因首先根据基因表达水平通过任意截止值的统计方法进行过滤。虽然基因过滤过程有助于减少特征维度并避免过拟合,但存在一些重要疾病的致病基因被忽略的风险。
在这项研究中,我们提出了一种新的深度学习方法,即结合卷积神经网络和稳态小波变换(SWT-CNN),无需基于肿瘤基因组图谱进行基因过滤即可分层癌症患者并预测其临床结局。所提出的 SWT-CNN 优于最先进的算法,包括支持向量机(SVM)和逻辑回归(LR),并产生与随机森林(RF)相当的预测性能。此外,对于所有癌症类型,我们首先提出了一种利用卷积神经网络隐藏层中的代表性特征为基因评分的方法,然后选择用于 Cox 比例风险回归的预后基因。结果表明,使用鉴定的预后基因作为特征可以有效地改善风险分层,表明 SWT-CNN 生成的代表性特征可以很好地将基因与癌症中的预后风险相关联,并有助于选择预后基因特征。
我们的结果表明,基于基因表达的 SWT-CNN 模型可以成为分层癌症患者预后风险的优秀工具。此外,SWT-CNN 的代表性特征被验证为评估基因在风险分层中的重要性有用,并且可以进一步用于识别预后基因特征。