Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom.
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae166.
Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra- and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on biological and structural annotations already available in UniProt.
We present a Python library, PyCoM, which enables users to query and analyze coevolution matrices and sequence alignments of 457 622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a precompiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on biological and structural annotations from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design.
PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk.
计算方法来检测蛋白质中相关氨基酸位置已成为预测蛋白质内和蛋白质间残基相互作用、蛋白质结构以及突变对蛋白质稳定性和功能影响的有价值工具。虽然有许多工具和网络服务器可以计算共进化评分矩阵,但在 UniProt 中已经存在的生物和结构注释的基础上,还没有一个大型研究和模式检测的对齐和共进化矩阵的中央存储库。
我们提出了一个 Python 库 PyCoM,它使用户能够查询和分析来自 UniProtKB/Swiss-Prot 数据库(长度≤500 个残基)的 457622 个蛋白质的共进化矩阵和序列比对,这些蛋白质是从预编译的共进化矩阵数据库(PyCoMdb)中选择的。PyCoM 促进了使用来自 UniProtKB/Swiss-Prot 的生物和结构注释进行残基共进化模式的统计分析的开发,对于新手和高级用户都可以方便地访问 PyCoMdb,支持 Jupyter Notebooks、Python 脚本和 Web API 访问。该资源是开源的,将有助于生成数据驱动的计算模型和方法,以研究和理解蛋白质结构、稳定性、功能和设计。
PyCoM 代码可从 https://github.com/scdantu/pycom 免费获得,而 PyCoMdb 和 Jupyter Notebook 教程可从 https://pycom.brunel.ac.uk 免费获得。