Suppr超能文献

pyBedGraph:一个用于快速操作一维基因组信号轨迹的 Python 包。

pyBedGraph: a python package for fast operations on 1D genomic signal tracks.

机构信息

Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA 92093, USA.

The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.

出版信息

Bioinformatics. 2020 May 1;36(10):3234-3235. doi: 10.1093/bioinformatics/btaa061.

Abstract

MOTIVATION

Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed.

RESULTS

We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 s and can compute their approximate means in <0.12 s on a conventional laptop.

AVAILABILITY AND IMPLEMENTATION

pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

现代基因组学研究受到下一代测序实验的推动,例如 ChIP-seq 和 ChIA-PET,它们生成转录因子结合的覆盖文件,以及 DHS 和 ATAC-seq,它们生成染色质可及性的覆盖文件。这些文件采用 bedGraph 文本格式或 bigWig 二进制格式。在给定区域获取汇总统计信息是分析蛋白质结合强度或染色质可及性的基本任务。然而,用于操作覆盖文件的现有 Python 包不是针对速度进行优化的。

结果

我们开发了 pyBedGraph,这是一个 Python 包,用于快速获取 bedGraph 或 bigWig 文件中给定区间的汇总统计信息。在 12 个 ChIP-seq、ATAC-seq、RNA-seq 和 ChIA-PET 数据集上进行测试时,pyBedGraph 的速度平均比现有程序 pyBigWig 快 260 倍。平均而言,pyBedGraph 可以在约 0.26 秒内查找 100 万个区域的确切平均信号,并可以在传统笔记本电脑上在 <0.12 秒内计算它们的近似平均值。

可用性和实现

pyBedGraph 在 MIT 许可证下可在 https://github.com/TheJacksonLaboratory/pyBedGraph 上公开获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a6b/7214040/7b64aa2a8d2f/btaa061f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验