Suppr超能文献

CellBRF:一种基于细胞平衡和随机森林的单细胞聚类特征选择方法。

CellBRF: a feature selection method for single-cell clustering using cell balance and random forest.

机构信息

School of Computer Science and Engineering, Central South University, Changsha 410083, China.

Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.

出版信息

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i368-i376. doi: 10.1093/bioinformatics/btad216.

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering.

RESULTS

We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy.

AVAILABILITY AND IMPLEMENTATION

All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 通过与聚类方法相结合,提供了一种强大的工具,可以通过细胞亚群识别来剖析生物组织的复杂性。特征选择是提高单细胞聚类准确性和可解释性的关键步骤。现有的特征选择方法未能充分利用不同细胞类型中基因的鉴别潜力。我们假设,纳入此类信息可以进一步提高单细胞聚类的性能。

结果

我们开发了 CellBRF,这是一种特征选择方法,它考虑了基因对单细胞聚类的细胞类型的相关性。其关键思想是通过随机森林,根据预测的细胞标签来识别对区分细胞类型最重要的基因。此外,它还提出了一种类别平衡策略,以减轻细胞类型分布不平衡对特征重要性评估的影响。我们在 33 个代表不同生物学场景的 scRNA-seq 数据集上对 CellBRF 进行了基准测试,结果表明,它在聚类准确性和细胞邻域一致性方面明显优于最先进的特征选择方法。此外,我们通过三个案例研究,即细胞分化阶段识别、非恶性细胞亚型识别和稀有细胞识别,展示了我们所选特征的出色性能。CellBRF 提供了一种新的、有效的工具,可以提高单细胞聚类的准确性。

可用性和实现

CellBRF 的所有源代码均可在 https://github.com/xuyp-csu/CellBRF 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/83fa1b8a4732/btad216f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验