Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy.
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad495.
Coiled-coil domains (CCD) are widespread in all organisms and perform several crucial functions. Given their relevance, the computational detection of CCD is very important for protein functional annotation. State-of-the-art prediction methods include the precise identification of CCD boundaries, the annotation of the typical heptad repeat pattern along the coiled-coil helices as well as the prediction of the oligomerization state.
In this article, we describe CoCoNat, a novel method for predicting coiled-coil helix boundaries, residue-level register annotation, and oligomerization state. Our method encodes sequences with the combination of two state-of-the-art protein language models and implements a three-step deep learning procedure concatenated with a Grammatical-Restrained Hidden Conditional Random Field for CCD identification and refinement. A final neural network predicts the oligomerization state. When tested on a blind test set routinely adopted, CoCoNat obtains a performance superior to the current state-of-the-art both for residue-level and segment-level CCD. CoCoNat significantly outperforms the most recent state-of-the-art methods on register annotation and prediction of oligomerization states.
CoCoNat web server is available at https://coconat.biocomp.unibo.it. Standalone version is available on GitHub at https://github.com/BolognaBiocomp/coconat.
卷曲螺旋结构域(CCD)广泛存在于所有生物中,并发挥着多种关键功能。鉴于其重要性,卷曲螺旋结构域的计算检测对于蛋白质功能注释非常重要。最先进的预测方法包括精确识别 CCD 边界、沿着卷曲螺旋注释典型的七肽重复模式以及预测寡聚状态。
在本文中,我们描述了 CoCoNat,这是一种用于预测卷曲螺旋螺旋边界、残基级注册注释和寡聚状态的新方法。我们的方法使用两种最先进的蛋白质语言模型的组合对序列进行编码,并实现了一个三步深度学习过程,该过程与一个语法受限的隐式条件随机场串联,用于 CCD 的识别和细化。最后一个神经网络预测寡聚状态。在常规采用的盲测试集上进行测试时,CoCoNat 在残基和片段水平的 CCD 方面的性能均优于当前最先进的方法。在注册注释和寡聚状态预测方面,CoCoNat 明显优于最近的最先进方法。
CoCoNat 网络服务器可在 https://coconat.biocomp.unibo.it 上获得。独立版本可在 https://github.com/BolognaBiocomp/coconat 上获得。