Suppr超能文献

使用最近邻图平滑数据的单细胞基因集评分(gssnng)。

Single-cell gene set scoring with nearest neighbor graph smoothed data (gssnng).

作者信息

Gibbs David L, Strasser Michael K, Huang Sui

机构信息

Shmulevich Lab, Institute for Systems Biology, Seattle, WA 98106, United States.

Huang Lab, Institute for Systems Biology, Seattle, WA 98106, United States.

出版信息

Bioinform Adv. 2023 Oct 18;3(1):vbad150. doi: 10.1093/bioadv/vbad150. eCollection 2023.

Abstract

SUMMARY

Gene set scoring (or enrichment) is a common dimension reduction task in bioinformatics that can be focused on the differences between groups or at the single sample level. Gene sets can represent biological functions, molecular pathways, cell identities, and more. Gene set scores are context dependent values that are useful for interpreting biological changes following experiments or perturbations. Single sample scoring produces a set of scores, one for each member of a group, which can be analyzed with statistical models that can include additional clinically important factors such as gender or age. However, the sparsity and technical noise of single-cell expression measures create difficulties for these methods, which were originally designed for bulk expression profiling (microarrays, RNAseq). This can be greatly remedied by first applying a smoothing transformation that shares gene measure information within transcriptomic neighborhoods. In this work, we use the nearest neighbor graph of cells for matrix smoothing to produce high quality gene set scores on a per-cell, per-group, level which is useful for visualization and statistical analysis.

AVAILABILITY AND IMPLEMENTATION

The gssnng software is available using the python package index (PyPI) and works with Scanpy AnnData objects. It can be installed using "pip install gssnng." More information and demo notebooks: see https://github.com/IlyaLab/gssnng.

摘要

摘要

基因集评分(或富集分析)是生物信息学中一项常见的降维任务,可聚焦于组间差异或单样本水平。基因集可代表生物学功能、分子通路、细胞类型等。基因集分数是依赖于上下文的值,有助于解释实验或扰动后的生物学变化。单样本评分会生成一组分数,每组中的每个成员都有一个分数,这些分数可通过统计模型进行分析,该模型可纳入其他临床重要因素,如性别或年龄。然而,单细胞表达测量的稀疏性和技术噪声给这些最初为批量表达谱分析(微阵列、RNA测序)设计的方法带来了困难。通过首先应用一种在转录组邻域内共享基因测量信息的平滑变换,可以极大地解决这个问题。在这项工作中,我们使用细胞的最近邻图进行矩阵平滑,以在每个细胞、每组水平上生成高质量的基因集分数,这对于可视化和统计分析很有用。

可用性和实现方式

gssnng软件可通过Python包索引(PyPI)获取,并与Scanpy AnnData对象配合使用。可使用“pip install gssnng”进行安装。更多信息和演示笔记本:见https://github.com/IlyaLab/gssnng。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64b7/10599965/89a8b0ae8d25/vbad150f1.jpg

相似文献

1
Single-cell gene set scoring with nearest neighbor graph smoothed data (gssnng).
Bioinform Adv. 2023 Oct 18;3(1):vbad150. doi: 10.1093/bioadv/vbad150. eCollection 2023.
2
SCANPY: large-scale single-cell gene expression data analysis.
Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0.
5
oggmap: a Python package to extract gene ages per orthogroup and link them with single-cell RNA data.
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad657.
6
scSampler: fast diversity-preserving subsampling of large-scale single-cell transcriptomic data.
Bioinformatics. 2022 May 26;38(11):3126-3127. doi: 10.1093/bioinformatics/btac271.
8
Pathway analysis through mutual information.
Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btad776.
9
GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.
10
Interactive network visualization in Jupyter notebooks: visJS2jupyter.
Bioinformatics. 2018 Jan 1;34(1):126-128. doi: 10.1093/bioinformatics/btx581.

引用本文的文献

2
Atlas-scale metabolic activities inferred from single-cell and spatial transcriptomics.
bioRxiv. 2025 May 14:2025.05.09.653038. doi: 10.1101/2025.05.09.653038.
3
A brain cell atlas integrating single-cell transcriptomes across human brain regions.
Nat Med. 2024 Sep;30(9):2679-2691. doi: 10.1038/s41591-024-03150-z. Epub 2024 Aug 2.

本文引用的文献

1
Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data.
NAR Genom Bioinform. 2023 Mar 3;5(1):lqad024. doi: 10.1093/nargab/lqad024. eCollection 2023 Mar.
2
decoupleR: ensemble of computational methods to infer biological activities from omics data.
Bioinform Adv. 2022 Mar 8;2(1):vbac016. doi: 10.1093/bioadv/vbac016. eCollection 2022.
3
Confronting false discoveries in single-cell differential expression.
Nat Commun. 2021 Sep 28;12(1):5692. doi: 10.1038/s41467-021-25960-2.
4
Demystifying "drop-outs" in single-cell UMI data.
Genome Biol. 2020 Aug 6;21(1):196. doi: 10.1186/s13059-020-02096-y.
5
Gene Set Analysis: Challenges, Opportunities, and Future Research.
Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020.
6
MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions.
Genome Biol. 2019 Oct 11;20(1):206. doi: 10.1186/s13059-019-1812-2.
8
Dimensionality reduction for visualizing single-cell data using UMAP.
Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.
9
Single sample scoring of molecular phenotypes.
BMC Bioinformatics. 2018 Nov 6;19(1):404. doi: 10.1186/s12859-018-2435-4.
10
Gene expression variability across cells and species shapes innate immunity.
Nature. 2018 Nov;563(7730):197-202. doi: 10.1038/s41586-018-0657-2. Epub 2018 Oct 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验