Programme in Cardiovascular and Metabolic Disorders, Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore.
The School of Mechanical and Aerospace Engineering and the School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798, Singapore.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae482.
Single-cell RNA sequencing (scRNA-seq) technologies can generate transcriptomic profiles at a single-cell resolution in large patient cohorts, facilitating discovery of gene and cellular biomarkers for disease. Yet, when the number of biomarker genes is large, the translation to clinical applications is challenging due to prohibitive sequencing costs. Here, we introduce scPanel, a computational framework designed to bridge the gap between biomarker discovery and clinical application by identifying a sparse gene panel for patient classification from the cell population(s) most responsive to perturbations (e.g. diseases/drugs). scPanel incorporates a data-driven way to automatically determine a minimal number of informative biomarker genes. Patient-level classification is achieved by aggregating the prediction probabilities of cells associated with a patient using the area under the curve score. Application of scPanel to scleroderma, colorectal cancer, and COVID-19 datasets resulted in high patient classification accuracy using only a small number of genes (<20), automatically selected from the entire transcriptome. In the COVID-19 case study, we demonstrated cross-dataset generalizability in predicting disease state in an external patient cohort. scPanel outperforms other state-of-the-art gene selection methods for patient classification and can be used to identify parsimonious sets of reliable biomarker candidates for clinical translation.
单细胞 RNA 测序 (scRNA-seq) 技术可以在大型患者队列中以单细胞分辨率生成转录组谱,有助于发现疾病的基因和细胞生物标志物。然而,当生物标志物基因数量较多时,由于测序成本过高,将其转化为临床应用具有挑战性。在这里,我们介绍了 scPanel,这是一种计算框架,旨在通过从对干扰(例如疾病/药物)最敏感的细胞群体中识别出用于患者分类的稀疏基因面板,来弥合生物标志物发现和临床应用之间的差距。scPanel 采用了一种数据驱动的方法,自动确定信息量最少的生物标志物基因的数量。通过使用曲线下面积评分来聚合与患者相关的细胞的预测概率,实现患者级别的分类。将 scPanel 应用于硬皮病、结直肠癌和 COVID-19 数据集,仅使用从整个转录组中自动选择的少数基因(<20 个),即可实现高精度的患者分类。在 COVID-19 案例研究中,我们证明了在外部患者队列中预测疾病状态的跨数据集泛化能力。scPanel 在患者分类方面优于其他最先进的基因选择方法,可用于确定用于临床转化的可靠生物标志物候选的简约集。