Yu Qiaoni, Li Yuan-Yuan, Chen Yunqin
Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.
Shanghai Genbase Biotechnology Co., Ltd, Shanghai, China.
Commun Biol. 2025 Mar 27;8(1):504. doi: 10.1038/s42003-025-07942-y.
Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing tumor heterogeneity, yet accurately identifying malignant cells remains challenging. Here, we propose scMalignantFinder, a machine learning tool specifically designed to distinguish malignant cells from their normal counterparts using a data- and knowledge-driven strategy. To develop the tool, multiple cancer datasets were collected, and the initially annotated malignant cells were calibrated using nine carefully curated pan-cancer gene signatures, resulting in over 400,000 single-cell transcriptomes for training. The union of differentially expressed genes across datasets was taken as the features for model construction to comprehensively capture tumor transcriptional diversity. scMalignantFinder outperformed existing automated methods across two gold-standard and eleven patient-derived scRNA-seq datasets. The capability to predict malignancy probability empowers scMalignantFinder to capture dynamic characteristics during tumor progression. Furthermore, scMalignantFinder holds the potential to annotate malignant regions in tumor spatial transcriptomics. Overall, we provide an efficient tool for detecting heterogeneous malignant cell populations.
单细胞RNA测序(scRNA-seq)是表征肿瘤异质性的强大工具,但准确识别恶性细胞仍然具有挑战性。在此,我们提出了scMalignantFinder,这是一种机器学习工具,专门设计用于使用数据和知识驱动的策略将恶性细胞与其正常对应细胞区分开来。为了开发该工具,收集了多个癌症数据集,并使用九个精心策划的泛癌基因特征对最初注释的恶性细胞进行校准,从而产生了超过40万个单细胞转录组用于训练。将跨数据集差异表达基因的并集作为模型构建的特征,以全面捕获肿瘤转录多样性。在两个金标准和十一个患者来源的scRNA-seq数据集上,scMalignantFinder的表现优于现有的自动化方法。预测恶性概率的能力使scMalignantFinder能够捕获肿瘤进展过程中的动态特征。此外,scMalignantFinder有潜力在肿瘤空间转录组学中注释恶性区域。总体而言,我们提供了一种检测异质性恶性细胞群体的有效工具。