Gibbs David L, Strasser Michael K, Huang Sui
Shmulevich Lab, Institute for Systems Biology, Seattle, WA 98106, United States.
Huang Lab, Institute for Systems Biology, Seattle, WA 98106, United States.
Bioinform Adv. 2023 Oct 18;3(1):vbad150. doi: 10.1093/bioadv/vbad150. eCollection 2023.
Gene set scoring (or enrichment) is a common dimension reduction task in bioinformatics that can be focused on the differences between groups or at the single sample level. Gene sets can represent biological functions, molecular pathways, cell identities, and more. Gene set scores are context dependent values that are useful for interpreting biological changes following experiments or perturbations. Single sample scoring produces a set of scores, one for each member of a group, which can be analyzed with statistical models that can include additional clinically important factors such as gender or age. However, the sparsity and technical noise of single-cell expression measures create difficulties for these methods, which were originally designed for bulk expression profiling (microarrays, RNAseq). This can be greatly remedied by first applying a smoothing transformation that shares gene measure information within transcriptomic neighborhoods. In this work, we use the nearest neighbor graph of cells for matrix smoothing to produce high quality gene set scores on a per-cell, per-group, level which is useful for visualization and statistical analysis.
The gssnng software is available using the python package index (PyPI) and works with Scanpy AnnData objects. It can be installed using "pip install gssnng." More information and demo notebooks: see https://github.com/IlyaLab/gssnng.
基因集评分(或富集分析)是生物信息学中一项常见的降维任务,可聚焦于组间差异或单样本水平。基因集可代表生物学功能、分子通路、细胞类型等。基因集分数是依赖于上下文的值,有助于解释实验或扰动后的生物学变化。单样本评分会生成一组分数,每组中的每个成员都有一个分数,这些分数可通过统计模型进行分析,该模型可纳入其他临床重要因素,如性别或年龄。然而,单细胞表达测量的稀疏性和技术噪声给这些最初为批量表达谱分析(微阵列、RNA测序)设计的方法带来了困难。通过首先应用一种在转录组邻域内共享基因测量信息的平滑变换,可以极大地解决这个问题。在这项工作中,我们使用细胞的最近邻图进行矩阵平滑,以在每个细胞、每组水平上生成高质量的基因集分数,这对于可视化和统计分析很有用。
gssnng软件可通过Python包索引(PyPI)获取,并与Scanpy AnnData对象配合使用。可使用“pip install gssnng”进行安装。更多信息和演示笔记本:见https://github.com/IlyaLab/gssnng。