利用单细胞测序对批量 RNA 测序数据中的肿瘤亚群进行分类和特征分析。

Leveraging single-cell sequencing to classify and characterize tumor subgroups in bulk RNA-sequencing data.

机构信息

Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.

McGovern Medical School, Houston, TX, USA.

出版信息

J Neurooncol. 2024 Jul;168(3):515-524. doi: 10.1007/s11060-024-04710-6. Epub 2024 May 29.

DOI:10.1007/s11060-024-04710-6

PMID:38811523

Abstract

PURPOSE

Accurate classification of cancer subgroups is essential for precision medicine, tailoring treatments to individual patients based on their cancer subtypes. In recent years, advances in high-throughput sequencing technologies have enabled the generation of large-scale transcriptomic data from cancer samples. These data have provided opportunities for developing computational methods that can improve cancer subtyping and enable better personalized treatment strategies.

METHODS

Here in this study, we evaluated different feature selection schemes in the context of meningioma classification. To integrate interpretable features from the bulk (n = 77 samples) and single-cell profiling (∼ 10 K cells), we developed an algorithm named CLIPPR which combines the top-performing single-cell models, RNA-inferred copy number variation (CNV) signals, and the initial bulk model to create a meta-model.

RESULTS

While the scheme relying solely on bulk transcriptomic data showed good classification accuracy, it exhibited confusion between malignant and benign molecular classes in approximately ∼ 8% of meningioma samples. In contrast, models trained on features learned from meningioma single-cell data accurately resolved the sub-groups confused by bulk-transcriptomic data but showed limited overall accuracy. CLIPPR showed superior overall accuracy and resolved benign-malignant confusion as validated on n = 789 bulk meningioma samples gathered from multiple institutions. Finally, we showed the generalizability of our algorithm using our in-house single-cell (∼ 200 K cells) and bulk TCGA glioma data (n = 711 samples).

CONCLUSION

Overall, our algorithm CLIPPR synergizes the resolution of single-cell data with the depth of bulk sequencing and enables improved cancer sub-group diagnoses and insights into their biology.

摘要

目的

准确分类癌症亚组对于精准医学至关重要，可以根据患者的癌症亚型为其量身定制治疗方案。近年来，高通量测序技术的进步使得从癌症样本中生成大规模转录组数据成为可能。这些数据为开发计算方法提供了机会，这些方法可以改善癌症亚组分类，并为更好的个性化治疗策略提供支持。

方法

在这项研究中，我们评估了脑膜瘤分类背景下的不同特征选择方案。为了整合来自批量（n=77 个样本）和单细胞分析（约 10 K 个细胞）的可解释特征，我们开发了一种名为 CLIPPR 的算法，该算法结合了表现最佳的单细胞模型、RNA 推断的拷贝数变异（CNV）信号和初始批量模型，以创建一个元模型。

结果

虽然仅依赖批量转录组数据的方案显示出良好的分类准确性，但它在约 8%的脑膜瘤样本中表现出恶性和良性分子类别之间的混淆。相比之下，基于脑膜瘤单细胞数据中学习到的特征训练的模型可以准确地区分被批量转录组数据混淆的亚组，但总体准确性有限。CLIPPR 在 n=789 个来自多个机构的批量脑膜瘤样本上验证时表现出更高的总体准确性，并解决了良性-恶性混淆的问题。最后，我们使用我们内部的单细胞（约 200 K 个细胞）和批量 TCGA 神经胶质瘤数据（n=711 个样本）展示了我们算法的泛化能力。