基于主成分分析的大规模单细胞 RNA-seq 基准测试

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing.

机构信息

Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198, Japan.

Japan Science and Technology Agency, PRESTO, 5-3, Yonbancho, Chiyoda-ku, Tokyo, 102-8666, Japan.

出版信息

Genome Biol. 2020 Jan 20;21(1):9. doi: 10.1186/s13059-019-1900-3.

DOI:10.1186/s13059-019-1900-3

PMID:31955711

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6970290/

Abstract

BACKGROUND

Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory.

RESULTS

In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms.

CONCLUSION

We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.

摘要

背景

主成分分析（PCA）是分析单细胞 RNA 测序（scRNA-seq）数据集的一种基本方法，但对于大规模 scRNA-seq 数据集，计算时间长且消耗大量内存。

结果

在这项工作中，我们回顾了现有的快速且节省内存的 PCA 算法和实现，并评估了它们在大规模 scRNA-seq 数据集上的实际应用。我们的基准测试表明，一些基于 Krylov 子空间和随机奇异值分解的 PCA 算法速度快、节省内存且比其他算法更准确。

结论

我们根据用户和开发人员计算环境的差异，制定了一个选择合适 PCA 实现的指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dfe/6970290/dd9b159f0e18/13059_2019_1900_Fig1_HTML.jpg

相似文献

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing.

Genome Biol. 2020 Jan 20;21(1):9. doi: 10.1186/s13059-019-1900-3.

Dimensionality Reduction of Single-Cell RNA-Seq Data.

Methods Mol Biol. 2021;2284:331-342. doi: 10.1007/978-1-0716-1307-8_18.

Tuning parameters of dimensionality reduction methods for single-cell RNA-seq analysis.

Genome Biol. 2020 Aug 24;21(1):212. doi: 10.1186/s13059-020-02128-7.

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.

Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24.

Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca.

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae494.

Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data.

Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae014.

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis.

Int J Mol Sci. 2020 Aug 12;21(16):5797. doi: 10.3390/ijms21165797.

Visualization of Single Cell RNA-Seq Data Using t-SNE in R.

Methods Mol Biol. 2020;2117:159-167. doi: 10.1007/978-1-0716-0301-7_8.

Scedar: A scalable Python package for single-cell RNA-seq exploratory data analysis.

PLoS Comput Biol. 2020 Apr 27;16(4):e1007794. doi: 10.1371/journal.pcbi.1007794. eCollection 2020 Apr.

Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data.

Cell Syst. 2021 Feb 17;12(2):176-194.e6. doi: 10.1016/j.cels.2020.11.008. Epub 2020 Dec 17.

引用本文的文献

Paradigms, innovations, and biological applications of RNA velocity: a comprehensive review.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf339.

Deep learning-enhanced clustering and classification of protein molecule tertiary structures using weighted distance matrices.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf331.

Evaluating discrepancies in dimensionality reduction for time-series single-cell RNA-sequencing data.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf287.

Single microorganism RNA sequencing of microbiomes using smRandom-Seq.

Nat Protoc. 2025 May 22. doi: 10.1038/s41596-025-01181-5.

The application of machine learning in clinical microbiology and infectious diseases.

Front Cell Infect Microbiol. 2025 May 1;15:1545646. doi: 10.3389/fcimb.2025.1545646. eCollection 2025.

PRC1 as an independent adverse prognostic factor in Wilms tumor via integrated bioinformatics and experimental validation.

Sci Rep. 2025 Apr 17;15(1):13282. doi: 10.1038/s41598-025-98030-y.

Alterations in Gene Expression and Alternative Splicing Induced by Plasmid-Mediated Overexpression of GFP and Within the A549 Cell Line.

Int J Mol Sci. 2025 Mar 25;26(7):2973. doi: 10.3390/ijms26072973.

Identification of exosome-related SERPINB1 as a novel predictor for tumor immune microenvironment and clinical outcomes in ovarian cancer.

J Ovarian Res. 2025 Mar 28;18(1):65. doi: 10.1186/s13048-025-01589-3.

SPRY1 regulates macrophage M1 polarization in skin aging and melanoma prognosis.

Transl Oncol. 2025 Apr;54:102331. doi: 10.1016/j.tranon.2025.102331. Epub 2025 Feb 28.

A Benchmarking Study of Random Projections and Principal Components for Dimensionality Reduction Strategies in Single Cell Analysis.

bioRxiv. 2025 Feb 8:2025.02.04.636499. doi: 10.1101/2025.02.04.636499.

本文引用的文献

Scalable probabilistic PCA for large-scale genetic variation data.

PLoS Genet. 2020 May 29;16(5):e1008773. doi: 10.1371/journal.pgen.1008773. eCollection 2020 May.

Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model.

Genome Biol. 2019 Dec 23;20(1):295. doi: 10.1186/s13059-019-1861-6.

Fast, sensitive and accurate integration of single-cell data with Harmony.

Nat Methods. 2019 Dec;16(12):1289-1296. doi: 10.1038/s41592-019-0619-0. Epub 2019 Nov 18.

ascend: R package for analysis of single-cell RNA-seq data.

Gigascience. 2019 Aug 1;8(8). doi: 10.1093/gigascience/giz087.

Supervised classification enables rapid annotation of cell atlases.

Nat Methods. 2019 Oct;16(10):983-986. doi: 10.1038/s41592-019-0535-3. Epub 2019 Sep 9.

Julia: come for the syntax, stay for the speed.

Nature. 2019 Aug;572(7767):141-142. doi: 10.1038/d41586-019-02310-3.

Essential guidelines for computational method benchmarking.

Genome Biol. 2019 Jun 20;20(1):125. doi: 10.1186/s13059-019-1738-8.

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis.

Front Genet. 2019 Apr 5;10:317. doi: 10.3389/fgene.2019.00317. eCollection 2019.

Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq.

Cell Syst. 2019 Apr 24;8(4):315-328.e8. doi: 10.1016/j.cels.2019.03.010.

TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes.

Bioinformatics. 2019 Oct 1;35(19):3679-3683. doi: 10.1093/bioinformatics/btz157.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于主成分分析的大规模单细胞 RNA-seq 基准测试

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献