Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, United States.
Department of Systems Biology, UMass Chan Medical School, Worcester, MA 01605, United States.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae088.
Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments.
Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features.
Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.
基因组区间是计算基因组生物学中最常见的数据结构之一,用于表示从基因、DNA 结合位点到疾病变体等各种特征。基因组区间上的操作为询问特征之间关系的问题提供了一种语言。虽然有用于命令行的优秀区间运算工具,但它们与 Python 没有很好地集成,Python 是最流行的通用计算和可视化环境之一。
Bioframe 是一个库,用于在 Python 中对基因组区间数据框进行灵活且高性能的操作。Bioframe 通过直接构建在两个最常用的 Python 库 NumPy 和 Pandas 之上,将 Python 数据科学栈扩展到计算基因组生物学用例。bioframe API 支持灵活的名称和列顺序,并将操作与数据格式解耦,以避免不必要的转换,这是生物信息学家常见的问题。Bioframe 在保持高性能和丰富功能的同时实现了这些目标。
Bioframe 是 MIT 许可证下的开源软件,跨平台,可从 Python 包索引安装。源代码由 Open2C 在 GitHub 上维护,网址为 https://github.com/open2c/bioframe。