Suppr超能文献

单细胞转录组学的统计学原理特征选择

Statistically principled feature selection for single cell transcriptomics.

作者信息

Dollinger Emmanuel, Silkwood Kai, Atwood Scott, Nie Qing, Lander Arthur D

机构信息

Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697.

Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697.

出版信息

bioRxiv. 2024 Oct 15:2024.10.11.617709. doi: 10.1101/2024.10.11.617709.

Abstract

The high dimensionality of data in single cell transcriptomics (scRNAseq) requires investigators to choose subsets of genes (feature selection) for downstream analysis (e.g., unsupervised cell clustering). The evaluation of different approaches to feature selection is hampered by the fact that, as we show here, the performance of feature selection methods varies greatly with the task being performed. For routine cell type identification, even randomly chosen features can perform well, but for cell type differences that are subtle, both number of features and selection strategy can matter strongly. Here we present a simple feature selection method grounded in an analytical model that, without resorting to arbitrary thresholds or user-defined parameters, allows for interpretable delineation of both how many and which features to choose, facilitating identification of biologically meaningful rare cell types. We compare this method to default methods in scanpy and Seurat, as well as SCTransform, showing how greater accuracy can often be achieved with surprisingly few, well-chosen features.

摘要

单细胞转录组学(scRNAseq)中数据的高维度要求研究者选择基因子集(特征选择)用于下游分析(例如无监督细胞聚类)。正如我们在此所展示的,特征选择方法的性能会因所执行的任务而有很大差异,这一事实阻碍了对不同特征选择方法的评估。对于常规的细胞类型识别,即使是随机选择的特征也能表现良好,但对于细微的细胞类型差异,特征数量和选择策略都可能至关重要。在此,我们提出一种基于分析模型的简单特征选择方法,该方法无需借助任意阈值或用户定义的参数,就能对选择多少特征以及选择哪些特征进行可解释的描绘,有助于识别具有生物学意义的稀有细胞类型。我们将此方法与scanpy和Seurat中的默认方法以及SCTransform进行比较,展示了如何通过数量惊人少但精心选择的特征常常能实现更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d2/11507810/a84602b90c0d/nihpp-2024.10.11.617709v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验