Department of Chemistry, Tongji University, Shanghai, China.
PLoS One. 2013 Apr 11;8(4):e60559. doi: 10.1371/journal.pone.0060559. Print 2013.
The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved.
In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy.
The DomHR is available at http://cal.tongji.edu.cn/domain/.
蛋白质结构域的精确预测,即蛋白质的结构、功能和进化单位,是近年来的研究热点。尽管已经提出了许多用于预测蛋白质结构域和边界的方法,但预测的准确性仍有待提高。
在这项研究中,我们提出了一种新的方法 DomHR,它是一种基于铰链区域策略的蛋白质结构域边界的准确预测器。铰链区被定义为覆盖部分结构域和边界区域的氨基酸片段。我们开发了一种策略,用于构建通过序列-结构域/铰链/边界比对生成的结构域-铰链-边界(DHB)特征的轮廓,这些特征与已知结构域结构的数据库相对应。DHB 特征有三个元素:归一化结构域、铰链和边界概率。DHB 特征被用作识别序列中结构域边界的输入。DomHR 使用非冗余数据集作为训练集,DHB 和预测形状字符串作为特征,条件随机场作为分类算法。在预测的铰链区中,根据决策阈值确定残基是结构域还是边界。在优化决策阈值后,通过交叉验证、大规模预测、独立测试和 CASP(蛋白质结构预测技术的关键评估)测试对 DomHR 进行了评估。所有结果都证实,与其他成熟的、可公开获得的结构域边界预测器相比,DomHR 在预测准确性方面表现更为出色。
DomHR 可在 http://cal.tongji.edu.cn/domain/ 上获取。