Wang Yangyang, Gao Xiaoguang, Ru Xinxin, Sun Pengzhan, Wang Jihan
School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China.
Xi'an Key Laboratory of Stem Cell and Regenerative Medicine, Institute of Medical Research, Northwestern Polytechnical University, Xi'an 710072, China.
Entropy (Basel). 2023 Jun 29;25(7):1003. doi: 10.3390/e25071003.
Feature selection plays an important role in improving the performance of classification or reducing the dimensionality of high-dimensional datasets, such as high-throughput genomics/proteomics data in bioinformatics. As a popular approach with computational efficiency and scalability, information theory has been widely incorporated into feature selection. In this study, we propose a unique weight-based feature selection (WBFS) algorithm that assesses selected features and candidate features to identify the key protein biomarkers for classifying lung cancer subtypes from The Cancer Proteome Atlas (TCPA) database and we further explored the survival analysis between selected biomarkers and subtypes of lung cancer. Results show good performance of the combination of our WBFS method and Bayesian network for mining potential biomarkers. These candidate signatures have valuable biological significance in tumor classification and patient survival analysis. Taken together, this study proposes the WBFS method that helps to explore candidate biomarkers from biomedical datasets and provides useful information for tumor diagnosis or therapy strategies.
特征选择在提高分类性能或降低高维数据集(如生物信息学中的高通量基因组学/蛋白质组学数据)的维度方面发挥着重要作用。作为一种具有计算效率和可扩展性的流行方法,信息论已被广泛应用于特征选择。在本研究中,我们提出了一种独特的基于权重的特征选择(WBFS)算法,该算法评估已选特征和候选特征,以从癌症蛋白质组图谱(TCPA)数据库中识别用于分类肺癌亚型的关键蛋白质生物标志物,并且我们进一步探索了所选生物标志物与肺癌亚型之间的生存分析。结果表明,我们的WBFS方法与贝叶斯网络相结合在挖掘潜在生物标志物方面表现良好。这些候选特征在肿瘤分类和患者生存分析中具有重要的生物学意义。综上所述,本研究提出了WBFS方法,有助于从生物医学数据集中探索候选生物标志物,并为肿瘤诊断或治疗策略提供有用信息。