Thomas Morgan P H, Ajaib Shoaib, Tanner Georgette, Bulpitt Andrew J, Stead Lucy F
School of Computer Science, University of Leeds, UK.
Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK.
Neuro Oncol. 2025 Feb 1. doi: 10.1093/neuonc/noaf026.
Glioblastoma (GBM) presents a significant clinical challenge due to its aggressive nature and extensive heterogeneity. Tumour purity, the proportion of malignant cells within a tumour, is an important covariate for understanding the disease, having direct clinical relevance or obscuring signal of the malignant portion in molecular analyses of bulk samples. However, current methods for estimating tumour purity are non-specific and technically demanding. Therefore, we aimed to build a reliable and accessible purity estimator for GBM.
We developed GBMPurity, a deep-learning model specifically designed to estimate the purity of IDH-wildtype primary GBM from bulk RNA-seq data. The model was trained using simulated pseudobulk tumours of known purity from labelled single-cell data acquired from the GBmap resource. The performance of GBMPurity was evaluated and compared to several existing tools using independent datasets.
GBMPurity outperformed existing tools, achieving a mean absolute error of 0.15 and a concordance correlation coefficient of 0.88 on validation datasets. We demonstrate the utility of GBMPurity through inference on bulk RNA-seq samples and observe reduced purity of the Proneural molecular subtype relative to the Classical, attributed to the increased presence of healthy brain cells.
GBMPurity provides a reliable and accessible tool for estimating tumour purity from bulk RNA-seq data, enhancing the interpretation of bulk RNA-seq data and offering valuable insights into GBM biology. To facilitate the use of this model by the wider research community, GBMPurity is available as a web-based tool at: https://gbmdeconvoluter.leeds.ac.uk/.
胶质母细胞瘤(GBM)因其侵袭性和广泛的异质性而带来重大的临床挑战。肿瘤纯度,即肿瘤内恶性细胞的比例,是理解该疾病的一个重要协变量,在大宗样本的分子分析中具有直接的临床相关性或掩盖恶性部分的信号。然而,目前估计肿瘤纯度的方法不具有特异性且技术要求高。因此,我们旨在构建一种可靠且易于使用的GBM纯度估计器。
我们开发了GBMPurity,这是一种深度学习模型,专门设计用于从批量RNA测序数据中估计异柠檬酸脱氢酶(IDH)野生型原发性GBM的纯度。该模型使用从GBmap资源获取的标记单细胞数据中已知纯度的模拟伪批量肿瘤进行训练。使用独立数据集评估GBMPurity的性能并与几种现有工具进行比较。
GBMPurity优于现有工具,在验证数据集上实现了0.15的平均绝对误差和0.88的一致性相关系数。我们通过对批量RNA测序样本进行推断来证明GBMPurity的实用性,并观察到相对于经典分子亚型,神经干细胞样分子亚型的纯度降低,这归因于健康脑细胞的存在增加。
GBMPurity提供了一种可靠且易于使用的工具,用于从批量RNA测序数据中估计肿瘤纯度,增强了对批量RNA测序数据的解释,并为GBM生物学提供了有价值的见解。为了便于更广泛的研究群体使用该模型,GBMPurity可作为基于网络的工具在以下网址获得:https://gbmdeconvoluter.leeds.ac.uk/ 。