College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China.
School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China.
Cells. 2022 Aug 8;11(15):2456. doi: 10.3390/cells11152456.
Cancer is a highly heterogeneous disease, which leads to the fact that even the same cancer can be further classified into different subtypes according to its pathology. With the multi-omics data widely used in cancer subtypes identification, effective feature selection is essential for accurately identifying cancer subtypes. However, the feature selection in the existing cancer subtypes identification methods has the problem that the most helpful features cannot be selected from a biomolecular perspective, and the relationship between the selected features cannot be reflected. To solve this problem, we propose a method for feature selection to identify cancer subtypes based on the heterogeneity score of a single gene: HSSG. In the proposed method, the sample-similarity network of a single gene is constructed, and pseudo-F statistics calculates the heterogeneity score for cancer subtypes identification of each gene. Finally, we construct gene-gene networks using genes with higher heterogeneity scores and mine essential genes from the networks. From the seven TCGA data sets for three experiments, including cancer subtypes identification in single-omics data, the performance in feature selection of multi-omics data, and the effectiveness and stability of the selected features, HSSG achieves good performance in all. This indicates that HSSG can effectively select features for subtypes identification.
癌症是一种高度异质性的疾病,这导致即使是同一癌症也可以根据其病理学进一步分为不同的亚型。随着多组学数据在癌症亚型识别中的广泛应用,有效的特征选择对于准确识别癌症亚型至关重要。然而,现有癌症亚型识别方法中的特征选择存在一个问题,即无法从生物分子角度选择最有帮助的特征,也无法反映所选特征之间的关系。为了解决这个问题,我们提出了一种基于单个基因异质性得分的癌症亚型识别特征选择方法:HSSG。在该方法中,构建了单个基因的样本相似性网络,并使用伪 F 统计量计算每个基因的异质性得分,以用于癌症亚型识别。最后,我们使用具有更高异质性得分的基因构建基因-基因网络,并从网络中挖掘关键基因。通过来自三个实验的七个 TCGA 数据集,包括单组学数据中的癌症亚型识别、多组学数据中的特征选择性能以及所选特征的有效性和稳定性,HSSG 在所有方面都表现出良好的性能。这表明 HSSG 可以有效地选择特征进行亚型识别。