Suppr超能文献

使用 slow5curl 简化远程 nanopore 数据访问。

Streamlining remote nanopore data access with slow5curl.

机构信息

Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia.

Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute,Sydney, NSW 2010, Australia.

出版信息

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae016.

Abstract

BACKGROUND

As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduce slow5curl, a software package designed to streamline nanopore data sharing, accessibility, and reanalysis.

RESULTS

Slow5curl allows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data repository, without downloading the entire file. Slow5curl uses an index to quickly fetch specific reads from a large dataset in SLOW5/BLOW5 format and highly parallelized data access requests to maximize download speeds. Using all public nanopore data from the Human Pangenome Reference Consortium (>22 TB), we demonstrate how slow5curl can be used to quickly fetch and reanalyze raw signal reads corresponding to a set of target genes from each individual in large cohort dataset (n = 91), minimizing the time, egress costs, and local storage requirements for their reanalysis.

CONCLUSIONS

We provide slow5curl as a free, open-source package that will reduce frictions in data sharing for the nanopore community: https://github.com/BonsonW/slow5curl.

摘要

背景

随着纳米孔测序技术的不断发展,需要不断更新算法来维护大量原始电流信号数据以供重新分析,这是一个日益严峻的挑战。本文介绍了 slow5curl,这是一个软件包,旨在简化纳米孔数据共享、可访问性和重新分析。

结果

slow5curl 允许用户从存储在远程服务器(如公共数据存储库)上的原始纳米孔数据集获取指定的读取或一组读取,而无需下载整个文件。slow5curl 使用索引来快速从 SLOW5/BLOW5 格式的大型数据集获取特定读取,并使用高度并行化的数据访问请求来最大限度地提高下载速度。使用人类泛基因组参考联盟的所有公共纳米孔数据(>22 TB),我们展示了如何使用 slow5curl 从大型队列数据集(n=91)中每个个体的一组目标基因中快速获取和重新分析原始信号读取,从而最大限度地减少重新分析的时间、出口成本和本地存储需求。

结论

我们提供了免费的开源软件包 slow5curl,以减少纳米孔社区的数据共享障碍:https://github.com/BonsonW/slow5curl。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f1/11010652/2050153d1e6d/giae016fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验