Suppr超能文献

EpiCarousel:用于图谱级单细胞染色质可及性数据的元细胞的内存和时间高效识别。

EpiCarousel: memory- and time-efficient identification of metacells for atlas-level single-cell chromatin accessibility data.

作者信息

Li Sijie, Li Yuxi, Sun Yu, Li Yaru, Chen Xiaoyang, Tang Songming, Chen Shengquan

机构信息

School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China.

Institute of Health Service and Transfusion Medicine, Beijing 100850, China.

出版信息

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae191.

Abstract

SUMMARY

Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods.

AVAILABILITY AND IMPLEMENTATION

The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.

摘要

摘要

单细胞染色质可及性测序(scCAS)的最新技术进展为表观遗传异质性的表征带来了新见解。随着单细胞基因组学实验规模扩大到数十万细胞,下游分析所需的计算资源需求增长到难以处理的程度,超出了大多数研究人员的能力范围。在此,我们提出了EpiCarousel,这是一个基于惰性加载、并行处理和社区检测的定制Python软件包,用于在大规模scCAS数据中高效地识别元细胞(即同质细胞的出现),同时节省内存和时间。通过对五个不同协议、样本大小、维度、细胞类型数量和细胞类型不平衡程度的数据集进行全面实验,EpiCarousel在内存使用、计算时间以及包括细胞类型识别在内的多个下游分析的系统评估中优于基线方法。此外,EpiCarousel在2小时内对包含707043个细胞和1154611个峰的图谱级数据集执行预处理和下游细胞聚类,内存消耗<75GB,并且在表征细胞异质性方面比现有方法具有更优的性能。

可用性和实现方式

EpiCarousel软件文档完善,可在https://github.com/biox-nku/epicarousel上免费获取。它可以与广泛的scCAS分析工具包无缝互操作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a7/11037479/85890f5b21f4/btae191f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验