Biocomputing Group, Department of Biology, University of Bologna, 40126 Bologna, Italy.
Bioinformatics. 2009 Nov 1;25(21):2757-63. doi: 10.1093/bioinformatics/btp539. Epub 2009 Sep 10.
The widespread coiled-coil structural motif in proteins is known to mediate a variety of biological interactions. Recognizing a coiled-coil containing sequence and locating its coiled-coil domains are key steps towards the determination of the protein structure and function. Different tools are available for predicting coiled-coil domains in protein sequences, including those based on position-specific score matrices and machine learning methods.
In this article, we introduce a hidden Markov model (CCHMM_PROF) that exploits the information contained in multiple sequence alignments (profiles) to predict coiled-coil regions. The new method discriminates coiled-coil sequences with an accuracy of 97% and achieves a true positive rate of 79% with only 1% of false positives. Furthermore, when predicting the location of coiled-coil segments in protein sequences, the method reaches an accuracy of 80% at the residue level and a best per-segment and per-protein efficiency of 81% and 80%, respectively. The results indicate that CCHMM_PROF outperforms all the existing tools and can be adopted for large-scale genome annotation.
The dataset is available at http://www.biocomp.unibo.it/ approximately lisa/coiled-coils. The predictor is freely available at http://gpcr.biocomp.unibo.it/cgi/predictors/cchmmprof/pred_cchmmprof.cgi.
蛋白质中广泛存在的卷曲螺旋结构基序,已知能介导多种生物相互作用。识别包含卷曲螺旋的序列并定位其卷曲螺旋结构域,是确定蛋白质结构和功能的关键步骤。有多种工具可用于预测蛋白质序列中的卷曲螺旋结构域,包括基于位置特异性评分矩阵和机器学习方法的工具。
在本文中,我们介绍了一种隐马尔可夫模型(CCHMM_PROF),该模型利用多序列比对(profile)中包含的信息来预测卷曲螺旋区域。该新方法对卷曲螺旋序列的区分准确率为 97%,真阳性率为 79%,假阳性率仅为 1%。此外,在预测蛋白质序列中卷曲螺旋结构域的位置时,该方法在残基水平上的准确率达到 80%,每个结构域和每个蛋白质的最佳效率分别为 81%和 80%。结果表明,CCHMM_PROF 优于所有现有的工具,可用于大规模基因组注释。
数据集可在 http://www.biocomp.unibo.it/ 上获取,约 lisa/coiled-coils。预测器可在 http://gpcr.biocomp.unibo.it/cgi/predictors/cchmmprof/pred_cchmmprof.cgi 上免费获取。