School of Mathematics and Statistics at Shandong University, China.
academic leader of Computer Engineering in Shandong University, China.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa189.
Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.
癌症是一种高度异质的疾病,由不同细胞类型和组织中的失调引起。然而,不同的癌症可能具有共同的机制。确定参与癌症发生和发展的决定性基因至关重要,对多种癌症进行联合分析可能有助于发现不同癌症之间的重叠机制。在这项研究中,我们提出了一种融合特征选择框架,归因于集成方法,名为 Fisher 得分和梯度提升决策树 (FS-GBDT),用于在高维基因表达数据集中选择稳健和决定性的特征基因。对 11 个人类癌症类型进行联合分析,以探索癌症的关键特征基因子集。为了验证 FS-GBDT 的功效,我们通过支持向量机 (SVM) 分类器将其与其他四种常用特征选择算法进行了比较。该算法达到了最高的指标,优于其他四种方法。此外,我们对关键基因子集进行了基因本体分析和文献验证,并将该子集分为几个功能模块。功能模块可以用作疾病的标志物来替代基因芯片应用中难以重复发现的单个基因,并研究癌症的核心机制。