Suppr超能文献

利用fastglmpca对单细胞RNA测序数据进行加速降维

Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca.

作者信息

Weine Eric, Carbonetto Peter, Stephens Matthew

出版信息

bioRxiv. 2024 Jul 4:2024.03.23.586420. doi: 10.1101/2024.03.23.586420.

Abstract

SUMMARY

Motivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (RNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call "Alternating Poisson Regression" (APR), produces better quality fits, and in less time, than existing algorithms. APR is also memory-efficient, and lends itself to parallel implementation on multi-core processors, both of which are helpful for handling large single-cell RNA-seq data sets. We illustrate the benefits of this approach in two published single-cell RNA-seq data sets. The new algorithms are implemented in an R package, fastglmpca.

AVAILABILITY AND IMPLEMENTATION

The fastglmpca R package is released on CRAN for Windows, macOS and Linux, and the source code is available at github.com/stephenslab/fastglmpca under the open source GPL-3 license. Scripts to reproduce the results in this paper are also available in the GitHub repository.

CONTACT

mstephens@uchicago.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available on online.

摘要

摘要

受将主成分分析(PCA)应用于计数数据时出现的理论和实际问题的启发,汤斯等人引入了“泊松广义线性模型 - 主成分分析(Poisson GLM - PCA)”,这是一种适用于计数数据的PCA变体,作为单细胞RNA测序(RNA - seq)数据降维的工具。然而,拟合GLM - PCA在计算上具有挑战性。在这里,我们研究了这个问题,并表明一种我们称为“交替泊松回归(APR)”的简单算法,比现有算法能产生质量更好的拟合,且用时更短。APR还具有内存效率高的特点,并且适合在多核处理器上并行实现,这两点都有助于处理大型单细胞RNA - seq数据集。我们在两个已发表的单细胞RNA - seq数据集中展示了这种方法的优势。新算法在一个R包fastglmpca中实现。

可用性和实现方式

fastglmpca R包已在CRAN上发布,适用于Windows、macOS和Linux,源代码可在github.com/stephenslab/fastglmpca上获取,遵循开源GPL - 3许可。本文中用于重现结果的脚本也可在GitHub仓库中获取。

联系方式

mstephens@uchicago.edu

补充信息

补充数据可在网上获取。

相似文献

1
Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca.
bioRxiv. 2024 Jul 4:2024.03.23.586420. doi: 10.1101/2024.03.23.586420.
2
Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca.
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae494.
3
glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data.
Bioinformatics. 2021 Apr 5;36(24):5701-5702. doi: 10.1093/bioinformatics/btaa1009.
4
SCell: integrated analysis of single-cell RNA-seq data.
Bioinformatics. 2016 Jul 15;32(14):2219-20. doi: 10.1093/bioinformatics/btw201. Epub 2016 Apr 19.
5
smallWig: parallel compression of RNA-seq WIG files.
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
6
Rcount: simple and flexible RNA-Seq read counting.
Bioinformatics. 2015 Feb 1;31(3):436-7. doi: 10.1093/bioinformatics/btu680. Epub 2014 Oct 15.
7
Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data.
Genome Biol. 2021 Sep 6;22(1):258. doi: 10.1186/s13059-021-02451-7.
8
Beta-Poisson model for single-cell RNA-seq data analyses.
Bioinformatics. 2016 Jul 15;32(14):2128-35. doi: 10.1093/bioinformatics/btw202. Epub 2016 Apr 19.
10
RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts.
Bioinformatics. 2021 Sep 29;37(18):3048-3050. doi: 10.1093/bioinformatics/btab135.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验