Suppr超能文献

glmGamPoi:在单细胞计数数据上拟合 Gamma-Poisson 广义线性模型。

glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data.

机构信息

Genome Biology Unit, EMBL, Heidelberg 69117, Germany.

出版信息

Bioinformatics. 2021 Apr 5;36(24):5701-5702. doi: 10.1093/bioinformatics/btaa1009.

Abstract

MOTIVATION

The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts and an essential building block for analysis approaches including differential expression analysis, principal component analysis and factor analysis. Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation.

RESULTS

We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously.

AVAILABILITYAND IMPLEMENTATION

The package glmGamPoi is available from Bioconductor for Windows, macOS and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license. The scripts to reproduce the results of this paper are available on github.com/const-ae/glmGamPoi-Paper.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

Gamma-Poisson 分布是一种理论上和经验上都有依据的模型,可用于解释单细胞 RNA 测序计数的抽样变异性,也是包括差异表达分析、主成分分析和因子分析在内的分析方法的重要组成部分。从数据中推断其参数的现有实现方法通常难以处理单细胞数据集的规模,这些数据集可能包含数百万个细胞;同时,它们没有充分利用数据中经常出现零和其他小数字的事实。这些限制阻碍了该模型的采用,为统计上较差的方法(如对数似然变换)留下了空间。

结果

我们提出了一个新的 R 包,用于拟合 Gamma-Poisson 分布,与现有方法相比,它可以更快、更准确地处理具有现代单细胞数据集特征的数据。该软件可以在不将数据同时加载到 RAM 中的情况下在磁盘上处理数据。

可用性和实现

适用于 Windows、macOS 和 Linux 的 Bioconductor 提供了包 glmGamPoi,源代码可在 github.com/const-ae/glmGamPoi 下根据 GPL-3 许可证获得。可在 github.com/const-ae/glmGamPoi-Paper 上获取重现本文结果的脚本。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f3b/8023675/12cd2dedc226/btaa1009f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验