• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CellBRF:一种基于细胞平衡和随机森林的单细胞聚类特征选择方法。

CellBRF: a feature selection method for single-cell clustering using cell balance and random forest.

机构信息

School of Computer Science and Engineering, Central South University, Changsha 410083, China.

Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.

出版信息

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i368-i376. doi: 10.1093/bioinformatics/btad216.

DOI:10.1093/bioinformatics/btad216
PMID:37387178
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10311305/
Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering.

RESULTS

We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy.

AVAILABILITY AND IMPLEMENTATION

All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 通过与聚类方法相结合,提供了一种强大的工具,可以通过细胞亚群识别来剖析生物组织的复杂性。特征选择是提高单细胞聚类准确性和可解释性的关键步骤。现有的特征选择方法未能充分利用不同细胞类型中基因的鉴别潜力。我们假设,纳入此类信息可以进一步提高单细胞聚类的性能。

结果

我们开发了 CellBRF,这是一种特征选择方法,它考虑了基因对单细胞聚类的细胞类型的相关性。其关键思想是通过随机森林,根据预测的细胞标签来识别对区分细胞类型最重要的基因。此外,它还提出了一种类别平衡策略,以减轻细胞类型分布不平衡对特征重要性评估的影响。我们在 33 个代表不同生物学场景的 scRNA-seq 数据集上对 CellBRF 进行了基准测试,结果表明,它在聚类准确性和细胞邻域一致性方面明显优于最先进的特征选择方法。此外,我们通过三个案例研究,即细胞分化阶段识别、非恶性细胞亚型识别和稀有细胞识别,展示了我们所选特征的出色性能。CellBRF 提供了一种新的、有效的工具,可以提高单细胞聚类的准确性。

可用性和实现

CellBRF 的所有源代码均可在 https://github.com/xuyp-csu/CellBRF 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/fcb7f0c9d4db/btad216f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/83fa1b8a4732/btad216f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/52917439713a/btad216f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/71f76b6e6ac7/btad216f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/bd370324c83b/btad216f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/40f49fb6978a/btad216f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/fcb7f0c9d4db/btad216f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/83fa1b8a4732/btad216f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/52917439713a/btad216f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/71f76b6e6ac7/btad216f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/bd370324c83b/btad216f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/40f49fb6978a/btad216f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a79c/10311305/fcb7f0c9d4db/btad216f6.jpg

相似文献

1
CellBRF: a feature selection method for single-cell clustering using cell balance and random forest.CellBRF:一种基于细胞平衡和随机森林的单细胞聚类特征选择方法。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i368-i376. doi: 10.1093/bioinformatics/btad216.
2
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
3
Joint learning dimension reduction and clustering of single-cell RNA-sequencing data.单细胞 RNA 测序数据的联合降维和聚类学习。
Bioinformatics. 2020 Jun 1;36(12):3825-3832. doi: 10.1093/bioinformatics/btaa231.
4
A cofunctional grouping-based approach for non-redundant feature gene selection in unannotated single-cell RNA-seq analysis.一种基于共功能分组的方法,用于注释单细胞 RNA-seq 分析中非冗余特征基因选择。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad042.
5
FEED: a feature selection method based on gene expression decomposition for single cell clustering.FEED:一种基于基因表达分解的单细胞聚类特征选择方法。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad389.
6
Boosting scRNA-seq data clustering by cluster-aware feature weighting.通过聚类感知特征加权来提升 scRNA-seq 数据聚类。
BMC Bioinformatics. 2021 Jun 2;22(Suppl 6):130. doi: 10.1186/s12859-021-04033-7.
7
Learning deep features and topological structure of cells for clustering of scRNA-sequencing data.学习 scRNA-seq 数据聚类的细胞深度特征和拓扑结构。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac068.
8
Accurate feature selection improves single-cell RNA-seq cell clustering.准确的特征选择可提高单细胞 RNA-seq 细胞聚类。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab034.
9
scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA:基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
10
jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data.jSRC:一种用于单细胞 RNA-seq 数据聚类的灵活准确的联合学习算法。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa433.

引用本文的文献

1
Gut microbiota and tuberculosis.肠道微生物群与结核病
Imeta. 2025 Jun 22;4(4):e70054. doi: 10.1002/imt2.70054. eCollection 2025 Aug.
2
Differentiable graph clustering with structural grouping for single-cell RNA-seq data.用于单细胞RNA测序数据的具有结构分组的可微图聚类
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf347.
3
Redefining the high variable genes by optimized LOESS regression with positive ratio.通过带正比率的优化局部加权散点平滑回归重新定义高可变基因。

本文引用的文献

1
Detection of cell markers from single cell RNA-seq with sc2marker.使用 sc2marker 从单细胞 RNA-seq 中检测细胞标记物。
BMC Bioinformatics. 2022 Jul 12;23(1):276. doi: 10.1186/s12859-022-04817-5.
2
Highly Regional Genes: graph-based gene selection for single-cell RNA-seq data.高度区域性基因:基于图的单细胞 RNA-seq 数据基因选择。
J Genet Genomics. 2022 Sep;49(9):891-899. doi: 10.1016/j.jgg.2022.01.004. Epub 2022 Feb 8.
3
geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq.
BMC Bioinformatics. 2025 Apr 15;26(1):104. doi: 10.1186/s12859-025-06112-5.
4
Considerations for building and using integrated single-cell atlases.构建和使用整合单细胞图谱的注意事项。
Nat Methods. 2025 Jan;22(1):41-57. doi: 10.1038/s41592-024-02532-y. Epub 2024 Dec 13.
5
scCAD: Cluster decomposition-based anomaly detection for rare cell identification in single-cell expression data.scCAD:基于聚类分解的单细胞表达数据中稀有细胞异常检测方法。
Nat Commun. 2024 Aug 31;15(1):7561. doi: 10.1038/s41467-024-51891-9.
基因基础:一种从 scRNA-seq 中进行无监督选择靶向基因面板的迭代方法。
Genome Biol. 2021 Dec 6;22(1):333. doi: 10.1186/s13059-021-02548-z.
4
Feature selection revisited in the single-cell era.单细胞时代的特征选择再探讨。
Genome Biol. 2021 Dec 1;22(1):321. doi: 10.1186/s13059-021-02544-3.
5
DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data.DUBStepR 是一种可扩展的基于相关性的特征选择方法,用于准确地对单细胞数据进行聚类。
Nat Commun. 2021 Oct 6;12(1):5849. doi: 10.1038/s41467-021-26085-2.
6
scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses.scGNN 是一种用于单细胞 RNA-Seq 分析的新型图神经网络框架。
Nat Commun. 2021 Mar 25;12(1):1882. doi: 10.1038/s41467-021-22197-x.
7
Accurate feature selection improves single-cell RNA-seq cell clustering.准确的特征选择可提高单细胞 RNA-seq 细胞聚类。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab034.
8
FEATS: feature selection-based clustering of single-cell RNA-seq data.FEATS:基于特征选择的单细胞 RNA-seq 数据聚类。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa306.
9
A rank-based marker selection method for high throughput scRNA-seq data.基于秩的标记选择方法用于高通量 scRNA-seq 数据。
BMC Bioinformatics. 2020 Oct 23;21(1):477. doi: 10.1186/s12859-020-03641-z.
10
Current best practices in single-cell RNA-seq analysis: a tutorial.单细胞 RNA 测序分析的当前最佳实践:教程。
Mol Syst Biol. 2019 Jun 19;15(6):e8746. doi: 10.15252/msb.20188746.