文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

通过使用Apache Spark进行分布式计算实现可扩展的单细胞转录组分析。

Enabling scalable single-cell transcriptomic analysis through distributed computing with Apache spark.

作者信息

Adil Asif, Bhattacharya Namrata, Khan Naveed Jeelani, Asger Mohammed

机构信息

Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India.

Department of Pathology and Laboratory Medicine, School of Medicine, Indiana University Indianapolis, Indianapolis, IN, USA.

出版信息

Sci Rep. 2025 Jul 29;15(1):27713. doi: 10.1038/s41598-025-12897-5.


DOI:10.1038/s41598-025-12897-5
PMID:40731055
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12307815/
Abstract

As the field of single-cell genomics continues to develop, the generation of large-scale scRNA-seq datasets has become more prevalent. Although these datasets offer tremendous potential for shedding light on the complex biology of individual cells, the sheer volume of data presents significant challenges for management and analysis. Off late, to address these challenges, a new discipline, known as "big single-cell data science," has emerged. Within this field, a variety of computational tools have been developed to facilitate the processing and interpretation of scRNA-seq data. However, several of these tools primarily focus on the analytical aspect and tend to overlook the burgeoning data deluge generated by scRNA-seq experiments. In this study, we try to address this challenge and present a novel parallel analytical framework, scSPARKL, that leverages the power of Apache Spark to enable the efficient analysis of single-cell transcriptomic data. scSPARKL is fortified by a rich set of staged algorithms developed to optimize the Apache Spark's work environment. The tool incorporates six key operations for dealing with single-cell Big Data, including data reshaping, data preprocessing, cell/gene filtering, data normalization, dimensionality reduction, and clustering. By utilizing Spark's unlimited scalability, fault tolerance, and parallelism, the tool enables researchers to rapidly and accurately analyze scRNA-seq datasets of any size. We demonstrate the utility of our framework and algorithms through a series of experiments on real-world scRNA-seq data. Overall, our results suggest that scSPARKL represents a powerful and flexible tool for the analysis of single-cell transcriptomic data, with broad applications across the fields of biology and medicine.

摘要

随着单细胞基因组学领域的不断发展,大规模单细胞RNA测序(scRNA-seq)数据集的生成变得越来越普遍。尽管这些数据集为揭示单个细胞的复杂生物学特性提供了巨大潜力,但数据量之大使管理和分析面临重大挑战。最近,为应对这些挑战,一门名为“大单细胞数据科学”的新学科应运而生。在这个领域中,已经开发了各种计算工具来促进scRNA-seq数据的处理和解释。然而,其中一些工具主要侧重于分析方面,往往忽视了scRNA-seq实验产生的迅速增长的数据洪流。在本研究中,我们试图应对这一挑战,并提出一种新颖的并行分析框架scSPARKL,它利用Apache Spark的强大功能实现对单细胞转录组数据的高效分析。scSPARKL通过一组丰富的分阶段算法得到强化,这些算法旨在优化Apache Spark的工作环境。该工具包含处理单细胞大数据的六个关键操作,包括数据重塑、数据预处理、细胞/基因过滤、数据归一化、降维和聚类。通过利用Spark无限的可扩展性、容错性和并行性,该工具使研究人员能够快速准确地分析任何规模的scRNA-seq数据集。我们通过对真实世界scRNA-seq数据进行一系列实验来证明我们框架和算法的实用性。总体而言,我们的结果表明scSPARKL是一种用于分析单细胞转录组数据的强大且灵活的工具,在生物学和医学领域具有广泛应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/cc267728693d/41598_2025_12897_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/e8e119216908/41598_2025_12897_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/26b2bf48f45f/41598_2025_12897_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/9fc6a7583116/41598_2025_12897_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/164ddc12c2a2/41598_2025_12897_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/cc267728693d/41598_2025_12897_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/e8e119216908/41598_2025_12897_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/26b2bf48f45f/41598_2025_12897_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/9fc6a7583116/41598_2025_12897_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/164ddc12c2a2/41598_2025_12897_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae4/12307815/cc267728693d/41598_2025_12897_Fig5_HTML.jpg

相似文献

[1]
Enabling scalable single-cell transcriptomic analysis through distributed computing with Apache spark.

Sci Rep. 2025-7-29

[2]
Reference Vector-guided Evolutionary Algorithm for cluster analysis of single-cell transcriptomes.

Comput Methods Programs Biomed. 2025-9

[3]
DiSC: a statistical tool for fast differential expression analysis of individual-level single-cell RNA-seq data.

Bioinformatics. 2025-6-2

[4]
Deep learning tackles single-cell analysis-a survey of deep learning for scRNA-seq analysis.

Brief Bioinform. 2022-1-17

[5]
Refinement strategies for Tangram for reliable single-cell to spatial mapping.

Bioinformatics. 2025-7-1

[6]
ScInfeR: an efficient method for annotating cell types and sub-types in single-cell RNA-seq, ATAC-seq, and spatial omics.

Brief Bioinform. 2025-5-1

[7]
Soft graph clustering for single-cell RNA sequencing data.

BMC Bioinformatics. 2025-7-25

[8]
A survey of the methodological process of modeling, inference, and evaluation of gene regulatory networks using scRNA-Seq data.

Biosystems. 2025-7

[9]
ScAGCN: Graph Convolutional Network with Adaptive Aggregation Mechanism for scRNA-seq Data Dimensionality Reduction.

Interdiscip Sci. 2025-4-25

[10]
Short-Term Memory Impairment

2025-1

本文引用的文献

[1]
Integrative big transcriptomics data analysis implicates crucial role of MUC13 in pancreatic cancer.

Comput Struct Biotechnol J. 2023-5-1

[2]
Molecular Mechanisms of Flavonoids against Tumor Gamma-Herpesviruses and Their Correlated Cancers-A Focus on EBV and KSHV Life Cycles and Carcinogenesis.

Int J Mol Sci. 2022-12-23

[3]
Community-integrated multi-omics facilitates screening and isolation of the organohalide dehalogenation microorganism.

Innovation (Camb). 2022-11-22

[4]
Single-cell technologies: From research to application.

Innovation (Camb). 2022-10-18

[5]
Innovations in Genomics and Big Data Analytics for Personalized Medicine and Health Care: A Review.

Int J Mol Sci. 2022-4-22

[6]
Fast and memory-efficient scRNA-seq -means clustering with various distances.

ACM BCB. 2021-8

[7]
Single-Cell Transcriptomics: Current Methods and Challenges in Data Acquisition and Analysis.

Front Neurosci. 2021-4-22

[8]
Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview.

Methods Mol Biol. 2021

[9]
The Inhibitory Role of Rab11b in Osteoclastogenesis through Triggering Lysosome-Induced Degradation of c-Fms and RANK Surface Receptors.

Int J Mol Sci. 2020-12-8

[10]
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.

Int J Mol Sci. 2020-3-22

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索