Suppr超能文献

AnnSQL:一个基于Python和SQL的软件包,用于使用最少的计算资源进行快速大规模单细胞基因组学分析。

AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources.

作者信息

Pavan Kenny, Saunders Arpiar

出版信息

bioRxiv. 2025 Mar 22:2024.11.02.621676. doi: 10.1101/2024.11.02.621676.

Abstract

UNLABELLED

As single-cell genomics technologies continue to accelerate biological discovery, software tools that use elegant syntax and minimal computational resources to analyze atlas-scale datasets are increasingly needed. Here we introduce AnnSQL, a Python package that constructs an AnnData-inspired database using the in-process DuckDb engine, enabling orders-of-magnitude performance enhancements for parsing single-cell genomics datasets with the ease of SQL. We highlight AnnSQL functionality and demonstrate transformative runtime improvements by comparing AnnData or AnnSQL operations on a 4.4 million cell single-nucleus RNA-seq dataset: AnnSQL-based operations were executed in minutes on a laptop for which equivalent AnnData operations largely failed (or were ∼700x slower) on a high-performance computing cluster. AnnSQL lowers computational barriers for large-scale single-cell/nucleus RNA-seq analysis on a personal computer, while demonstrating a promising computational infrastructure extendable for complete single-cell workflows across various genome-wide measurements.

AVAILABILITY AND IMPLEMENTATION

AnnSQL is a pip installable package that can be found at https://github.com/ArpiarSaundersLab/annsql along with documentation at https://docs.annsql.com .

摘要

未标注

随着单细胞基因组学技术不断加速生物学发现,越来越需要使用简洁语法和最少计算资源来分析图谱规模数据集的软件工具。在此,我们介绍AnnSQL,这是一个Python包,它使用进程内DuckDb引擎构建受AnnData启发的数据库,从而在使用SQL的便捷性的同时,实现解析单细胞基因组学数据集时性能提升几个数量级。我们突出展示了AnnSQL的功能,并通过比较在一个440万个细胞的单核RNA测序数据集上的AnnData或AnnSQL操作,展示了变革性的运行时改进:基于AnnSQL的操作在笔记本电脑上只需几分钟即可执行,而等效的AnnData操作在高性能计算集群上大多失败(或慢约700倍)。AnnSQL降低了在个人计算机上进行大规模单细胞/细胞核RNA测序分析的计算障碍,同时展示了一种有前景的计算基础设施,可扩展用于跨各种全基因组测量的完整单细胞工作流程。

可用性与实现方式

AnnSQL是一个可通过pip安装的包,可在https://github.com/ArpiarSaundersLab/annsql找到,其文档位于https://docs.annsql.com

相似文献

4
pyrpipe: a Python package for RNA-Seq workflows.pyrpipe:一个用于RNA测序工作流程的Python软件包。
NAR Genom Bioinform. 2021 Jun 1;3(2):lqab049. doi: 10.1093/nargab/lqab049. eCollection 2021 Jun.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验