Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland.
Bioinformatics. 2022 Apr 28;38(9):2633-2635. doi: 10.1093/bioinformatics/btac121.
The wealth of protein structures collected in the Protein Data Bank enabled large-scale studies of their function and evolution. Such studies, however, require the generation of customized datasets combining the structural data with miscellaneous accessory resources providing functional, taxonomic and other annotations. Unfortunately, the functionality of currently available tools for the creation of such datasets is limited and their usage frequently requires laborious surveying of various data sources and resolving inconsistencies between their versions.
To address this problem, we developed localpdb, a versatile Python library for the management of protein structures and their annotations. The library features a flexible plugin system enabling seamless unification of the structural data with diverse auxiliary resources, full version control and powerful functionality of creating highly customized datasets. The localpdb can be used in a wide range of bioinformatic tasks, in particular those involving large-scale protein structural analyses and machine learning.
localpdb is freely available at https://github.com/labstructbioinf/localpdb. Documentation along with the usage examples can be accessed at https://labstructbioinf.github.io/localpdb/.
蛋白质数据库中收集的丰富蛋白质结构数据使大规模研究其功能和进化成为可能。然而,此类研究需要生成自定义数据集,将结构数据与提供功能、分类学和其他注释的各种辅助资源相结合。不幸的是,目前用于创建此类数据集的工具的功能有限,并且它们的使用通常需要费力地调查各种数据源,并解决它们版本之间的不一致性。
为了解决这个问题,我们开发了 localpdb,这是一个用于管理蛋白质结构及其注释的多功能 Python 库。该库具有灵活的插件系统,能够无缝地将结构数据与各种辅助资源统一起来,实现完整的版本控制,并具有创建高度自定义数据集的强大功能。localpdb 可用于广泛的生物信息学任务,特别是涉及大规模蛋白质结构分析和机器学习的任务。
localpdb 可在 https://github.com/labstructbioinf/localpdb 上免费获得。使用示例和文档可在 https://labstructbioinf.github.io/localpdb/ 上访问。