Federal University of Rio de Janeiro, Cambridge, UK.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab182.
The rapid developments in gene sequencing technologies achieved in the recent decades, along with the expansion of knowledge on the three-dimensional structures of proteins, have enabled the construction of proteome-scale databases of protein models such as the Genome3D and ModBase. Nevertheless, although gene products are usually expressed as individual polypeptide chains, most biological processes are associated with either transient or stable oligomerisation. In the PDB databank, for example, ~40% of the deposited structures contain at least one homo-oligomeric interface. Unfortunately, databases of protein models are generally devoid of multimeric structures. To tackle this particular issue, we have developed ProtCHOIR, a tool that is able to generate homo-oligomeric structures in an automated fashion, providing detailed information for the input protein and output complex. ProtCHOIR requires input of either a sequence or a protomeric structure that is queried against a pre-constructed local database of homo-oligomeric structures, then extensively analyzed using well-established tools such as PSI-Blast, MAFFT, PISA and Molprobity. Finally, MODELLER is employed to achieve the construction of the homo-oligomers. The output complex is thoroughly analyzed taking into account its stereochemical quality, interfacial stabilities, hydrophobicity and conservation profile. All these data are then summarized in a user-friendly HTML report that can be saved or printed as a PDF file. The software is easily parallelizable and also outputs a comma-separated file with summary statistics that can straightforwardly be concatenated as a spreadsheet-like document for large-scale data analyses. As a proof-of-concept, we built oligomeric models for the Mabellini Mycobacterium abscessus structural proteome database. ProtCHOIR can be run as a web-service and the code can be obtained free-of-charge at http://lmdm.biof.ufrj.br/protchoir.
在最近几十年中,基因测序技术取得了快速发展,同时蛋白质三维结构知识也得到了扩展,这使得能够构建蛋白质模型的蛋白质组规模数据库,例如 Genome3D 和 ModBase。尽管基因产物通常表达为单个多肽链,但大多数生物过程都与瞬时或稳定的寡聚化有关。例如,在 PDB 数据库中,~40%的已存入结构包含至少一个同源寡聚界面。不幸的是,蛋白质模型数据库通常缺乏多聚体结构。为了解决这个特殊问题,我们开发了 ProtCHOIR,这是一种能够自动生成同源寡聚体结构的工具,为输入蛋白质和输出复合物提供详细信息。ProtCHOIR 需要输入序列或原型结构,然后针对预先构建的同源寡聚体结构本地数据库进行查询,然后使用 PSI-Blast、MAFFT、PISA 和 Molprobity 等成熟工具进行广泛分析。最后,使用 MODELLER 来构建同源寡聚物。输出复合物将根据其立体化学质量、界面稳定性、疏水性和保守性概况进行彻底分析。所有这些数据都将汇总在一个用户友好的 HTML 报告中,可以保存或打印为 PDF 文件。该软件易于并行化,并且还会输出一个带逗号分隔的摘要统计文件,可以直接连接为大型数据分析的电子表格文档。作为概念验证,我们为 Mabellini 分枝杆菌结构蛋白质组数据库构建了寡聚体模型。ProtCHOIR 可以作为 Web 服务运行,并且可以在 http://lmdm.biof.ufrj.br/protchoir 免费获得代码。