• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

单细胞转录组学的统计学原理特征选择

Statistically principled feature selection for single cell transcriptomics.

作者信息

Dollinger Emmanuel, Silkwood Kai, Atwood Scott, Nie Qing, Lander Arthur D

机构信息

Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697.

Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697.

出版信息

bioRxiv. 2024 Oct 15:2024.10.11.617709. doi: 10.1101/2024.10.11.617709.

DOI:10.1101/2024.10.11.617709
PMID:39463971
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11507810/
Abstract

The high dimensionality of data in single cell transcriptomics (scRNAseq) requires investigators to choose subsets of genes (feature selection) for downstream analysis (e.g., unsupervised cell clustering). The evaluation of different approaches to feature selection is hampered by the fact that, as we show here, the performance of feature selection methods varies greatly with the task being performed. For routine cell type identification, even randomly chosen features can perform well, but for cell type differences that are subtle, both number of features and selection strategy can matter strongly. Here we present a simple feature selection method grounded in an analytical model that, without resorting to arbitrary thresholds or user-defined parameters, allows for interpretable delineation of both how many and which features to choose, facilitating identification of biologically meaningful rare cell types. We compare this method to default methods in scanpy and Seurat, as well as SCTransform, showing how greater accuracy can often be achieved with surprisingly few, well-chosen features.

摘要

单细胞转录组学(scRNAseq)中数据的高维度要求研究者选择基因子集(特征选择)用于下游分析(例如无监督细胞聚类)。正如我们在此所展示的,特征选择方法的性能会因所执行的任务而有很大差异,这一事实阻碍了对不同特征选择方法的评估。对于常规的细胞类型识别,即使是随机选择的特征也能表现良好,但对于细微的细胞类型差异,特征数量和选择策略都可能至关重要。在此,我们提出一种基于分析模型的简单特征选择方法,该方法无需借助任意阈值或用户定义的参数,就能对选择多少特征以及选择哪些特征进行可解释的描绘,有助于识别具有生物学意义的稀有细胞类型。我们将此方法与scanpy和Seurat中的默认方法以及SCTransform进行比较,展示了如何通过数量惊人少但精心选择的特征常常能实现更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/6dcf98c0a8f2/nihpp-2024.10.11.617709v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/a84602b90c0d/nihpp-2024.10.11.617709v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/c46acf8ed457/nihpp-2024.10.11.617709v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/d21df9fdaf9f/nihpp-2024.10.11.617709v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/ba5a48b75d2f/nihpp-2024.10.11.617709v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/6dcf98c0a8f2/nihpp-2024.10.11.617709v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/a84602b90c0d/nihpp-2024.10.11.617709v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/c46acf8ed457/nihpp-2024.10.11.617709v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/d21df9fdaf9f/nihpp-2024.10.11.617709v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/ba5a48b75d2f/nihpp-2024.10.11.617709v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/6dcf98c0a8f2/nihpp-2024.10.11.617709v1-f0005.jpg

相似文献

1
Statistically principled feature selection for single cell transcriptomics.单细胞转录组学的统计学原理特征选择
bioRxiv. 2024 Oct 15:2024.10.11.617709. doi: 10.1101/2024.10.11.617709.
2
Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data.用于单细胞测序数据中聚类特定频繁生物标志物发现的降维和Louvain凝聚层次聚类
Front Genet. 2022 Feb 7;13:828479. doi: 10.3389/fgene.2022.828479. eCollection 2022.
3
The impact of package selection and versioning on single-cell RNA-seq analysis.软件包选择和版本控制对单细胞RNA测序分析的影响。
bioRxiv. 2024 Apr 11:2024.04.04.588111. doi: 10.1101/2024.04.04.588111.
4
Characterizing efficient feature selection for single-cell expression analysis.对单细胞表达分析中的高效特征选择进行刻画。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae317.
5
Prediction of Weight Loss to Decrease the Risk for Type 2 Diabetes Using Multidimensional Data in Filipino Americans: Secondary Analysis.利用多维数据预测菲律宾裔美国人的体重减轻以降低2型糖尿病风险:二次分析
JMIR Diabetes. 2023 Apr 11;8:e44018. doi: 10.2196/44018.
6
Accurate feature selection improves single-cell RNA-seq cell clustering.准确的特征选择可提高单细胞 RNA-seq 细胞聚类。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab034.
7
A systematic performance evaluation of clustering methods for single-cell RNA-seq data.单细胞RNA测序数据聚类方法的系统性能评估
F1000Res. 2018 Jul 26;7:1141. doi: 10.12688/f1000research.15666.3. eCollection 2018.
8
Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data.基于模型的深度学习嵌入方法用于单细胞 RNA-seq 数据的约束聚类分析。
Nat Commun. 2021 Mar 25;12(1):1873. doi: 10.1038/s41467-021-22008-3.
9
STGNNks: Identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering.基于图神经网络、去噪自编码器和 k-sums 聚类的空间转录组学数据中的细胞类型识别。
Comput Biol Med. 2023 Nov;166:107440. doi: 10.1016/j.compbiomed.2023.107440. Epub 2023 Sep 9.
10
Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning.通过结合主特征分析、互信息和机器学习从单细胞数据中揭示的优化细胞类型特征。
Comput Struct Biotechnol J. 2023 Jun 5;21:3293-3314. doi: 10.1016/j.csbj.2023.06.002. eCollection 2023.

本文引用的文献

1
Leveraging gene correlations in single cell transcriptomic data.利用单细胞转录组数据中的基因相关性。
BMC Bioinformatics. 2024 Sep 18;25(1):305. doi: 10.1186/s12859-024-05926-z.
2
Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq.抗相关特征选择可防止 scRNAseq 中亚群的假发现。
Nat Commun. 2024 Jan 24;15(1):699. doi: 10.1038/s41467-023-43406-9.
3
Best practices for single-cell analysis across modalities.多模态单细胞分析的最佳实践。
Nat Rev Genet. 2023 Aug;24(8):550-572. doi: 10.1038/s41576-023-00586-w. Epub 2023 Mar 31.
4
Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges.单细胞RNA测序数据的差异表达分析:当前的统计方法与突出挑战
Entropy (Basel). 2022 Jul 18;24(7):995. doi: 10.3390/e24070995.
5
Benchmarking methods for detecting differential states between conditions from multi-subject single-cell RNA-seq data.用于检测多主体单细胞 RNA-seq 数据中条件间差异状态的基准方法。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac286.
6
Single-cell analysis of human basal cell carcinoma reveals novel regulators of tumor growth and the tumor microenvironment.单细胞分析人类基底细胞癌揭示了肿瘤生长和肿瘤微环境的新调节因子。
Sci Adv. 2022 Jun 10;8(23):eabm7981. doi: 10.1126/sciadv.abm7981.
7
sc-REnF: An entropy guided robust feature selection for single-cell RNA-seq data.sc-REnF:一种用于单细胞RNA测序数据的熵引导鲁棒特征选择方法
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab517.
8
Automatic cell type identification methods for single-cell RNA sequencing.用于单细胞RNA测序的自动细胞类型识别方法。
Comput Struct Biotechnol J. 2021 Oct 20;19:5874-5887. doi: 10.1016/j.csbj.2021.10.027. eCollection 2021.
9
Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data.用于单细胞 RNA-seq UMI 数据归一化的解析 Pearson 残差。
Genome Biol. 2021 Sep 6;22(1):258. doi: 10.1186/s13059-021-02451-7.
10
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.