Tasci Erdal, Jagasia Sarisha, Zhuge Ying, Sproull Mary, Cooley Zgela Theresa, Mackey Megan, Camphausen Kevin, Krauze Andra Valentina
Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA.
Cancers (Basel). 2023 May 9;15(10):2672. doi: 10.3390/cancers15102672.
Glioblastomas (GBM) are rapidly growing, aggressive, nearly uniformly fatal, and the most common primary type of brain cancer. They exhibit significant heterogeneity and resistance to treatment, limiting the ability to analyze dynamic biological behavior that drives response and resistance, which are central to advancing outcomes in glioblastoma. Analysis of the proteome aimed at signal change over time provides a potential opportunity for non-invasive classification and examination of the response to treatment by identifying protein biomarkers associated with interventions. However, data acquired using large proteomic panels must be more intuitively interpretable, requiring computational analysis to identify trends. Machine learning is increasingly employed, however, it requires feature selection which has a critical and considerable effect on machine learning problems when applied to large-scale data to reduce the number of parameters, improve generalization, and find essential predictors. In this study, using 7k proteomic data generated from the analysis of serum obtained from 82 patients with GBM pre- and post-completion of concurrent chemoirradiation (CRT), we aimed to select the most discriminative proteomic features that define proteomic alteration that is the result of administering CRT. Thus, we present a novel rank-based feature weighting method (RadWise) to identify relevant proteomic parameters using two popular feature selection methods, least absolute shrinkage and selection operator (LASSO) and the minimum redundancy maximum relevance (mRMR). The computational results show that the proposed method yields outstanding results with very few selected proteomic features, with higher accuracy rate performance than methods that do not employ a feature selection process. While the computational method identified several proteomic signals identical to the clinical intuitive (heuristic approach), several heuristically identified proteomic signals were not selected while other novel proteomic biomarkers not selected with the heuristic approach that carry biological prognostic relevance in GBM only emerged with the novel method. The computational results show that the proposed method yields promising results, reducing 7k proteomic data to 8 selected proteomic features with a performance value of 96.364%, comparing favorably with techniques that do not employ feature selection.
胶质母细胞瘤(GBM)生长迅速、具有侵袭性,几乎无一例外会导致死亡,是最常见的原发性脑癌类型。它们表现出显著的异质性和对治疗的抗性,这限制了对驱动反应和抗性的动态生物学行为进行分析的能力,而这种分析对于改善胶质母细胞瘤的治疗效果至关重要。针对随时间变化的信号进行蛋白质组分析,通过识别与干预相关的蛋白质生物标志物,为非侵入性分类和治疗反应检查提供了潜在机会。然而,使用大型蛋白质组学面板获取的数据必须更易于直观解释,这需要通过计算分析来识别趋势。机器学习的应用越来越广泛,但是,它需要进行特征选择,在应用于大规模数据时,特征选择对机器学习问题具有关键且相当大的影响,其目的是减少参数数量、提高泛化能力并找到关键预测因子。在本研究中,我们分析了82例接受同步放化疗(CRT)前后的GBM患者血清,生成了7k蛋白质组数据,旨在选择最具区分性的蛋白质组学特征,以定义因实施CRT而导致的蛋白质组改变。因此,我们提出了一种基于排序的新型特征加权方法(RadWise),使用两种流行的特征选择方法——最小绝对收缩和选择算子(LASSO)以及最小冗余最大相关(mRMR)来识别相关的蛋白质组学参数。计算结果表明,该方法通过极少的选定蛋白质组学特征产生了出色的结果,与未采用特征选择过程的方法相比,具有更高的准确率。虽然计算方法识别出了几个与临床直觉(启发式方法)相同的蛋白质组信号,但一些通过启发式方法识别出的蛋白质组信号未被选中,而其他一些未通过启发式方法选中但在GBM中具有生物学预后相关性的新型蛋白质组生物标志物仅通过该新方法得以显现。计算结果表明,该方法产生了有前景的结果,将7k蛋白质组数据减少到8个选定的蛋白质组学特征时,性能值达到96.364%,与未采用特征选择的技术相比具有优势。