Suppr超能文献

PsiNorm:一种用于单细胞 RNA-seq 数据的可扩展标准化方法。

PsiNorm: a scalable normalization for single-cell RNA-seq data.

机构信息

Department of Biology, University of Padova, Padua 35121, Italy.

Department of Statistical Sciences, University of Padova, Padua 35121, Italy.

出版信息

Bioinformatics. 2021 Dec 22;38(1):164-172. doi: 10.1093/bioinformatics/btab641.

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) enables transcriptome-wide gene expression measurements at single-cell resolution providing a comprehensive view of the compositions and dynamics of tissue and organism development. The evolution of scRNA-seq protocols has led to a dramatic increase of cells throughput, exacerbating many of the computational and statistical issues that previously arose for bulk sequencing. In particular, with scRNA-seq data all the analyses steps, including normalization, have become computationally intensive, both in terms of memory usage and computational time. In this perspective, new accurate methods able to scale efficiently are desirable.

RESULTS

Here, we propose PsiNorm, a between-sample normalization method based on the power-law Pareto distribution parameter estimate. Here, we show that the Pareto distribution well resembles scRNA-seq data, especially those coming from platforms that use unique molecular identifiers. Motivated by this result, we implement PsiNorm, a simple and highly scalable normalization method. We benchmark PsiNorm against seven other methods in terms of cluster identification, concordance and computational resources required. We demonstrate that PsiNorm is among the top performing methods showing a good trade-off between accuracy and scalability. Moreover, PsiNorm does not need a reference, a characteristic that makes it useful in supervised classification settings, in which new out-of-sample data need to be normalized.

AVAILABILITY AND IMPLEMENTATION

PsiNorm is implemented in the scone Bioconductor package and available at https://bioconductor.org/packages/scone/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 能够以单细胞分辨率进行全转录组基因表达测量,提供组织和生物体发育组成和动态的全面视图。scRNA-seq 方案的发展导致细胞通量急剧增加,加剧了以前批量测序出现的许多计算和统计问题。特别是,对于 scRNA-seq 数据,所有分析步骤,包括归一化,在内存使用和计算时间方面都变得计算密集。在这种情况下,需要新的准确且能够高效扩展的方法。

结果

在这里,我们提出了 PsiNorm,这是一种基于幂律 Pareto 分布参数估计的样本间归一化方法。在这里,我们表明 Pareto 分布很好地类似于 scRNA-seq 数据,尤其是那些来自使用独特分子标识符的平台的数据。受此结果的启发,我们实现了 PsiNorm,这是一种简单且高度可扩展的归一化方法。我们根据聚类识别、一致性和所需的计算资源等方面,将 PsiNorm 与其他七种方法进行了基准测试。我们证明 PsiNorm 是表现最好的方法之一,在准确性和可扩展性之间具有良好的折衷。此外,PsiNorm 不需要参考,这一特性使其在监督分类设置中非常有用,在这种设置中,需要对新的样本外数据进行归一化。

可用性和实现

PsiNorm 是在 scone Bioconductor 包中实现的,可在 https://bioconductor.org/packages/scone/ 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/8696108/1f986005a09a/btab641f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验