BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S12. doi: 10.1186/1471-2105-14-S14-S12. Epub 2013 Oct 9.
In recent years, the use and importance of predicted protein residue-residue contacts has grown considerably with demonstrated applications such as drug design, protein tertiary structure prediction and model quality assessment. Nevertheless, reported accuracies in the range of 25-35% stubbornly remain the norm for sequence based, long range contact predictions on hard targets. This is in spite of a prolonged effort on behalf of the community to improve the performance of residue-residue contact prediction. A thorough study of the quality of current residue-residue contact predictions and the evaluation metrics used as well as an analysis of current methods is needed to stimulate further advancement in contact prediction and its application. Such a study will better explain the quality and nature of residue-residue contact predictions generated by current methods and as a result lead to better use of this contact information.
We evaluated several sequence based residue-residue contact predictors that participated in the tenth Critical Assessment of protein Structure Prediction (CASP) experiment. The evaluation was performed using standard assessment techniques such as those used by the official CASP assessors as well as two novel evaluation metrics (i.e., cluster accuracy and cluster count). An in-depth analysis revealed that while most residue-residue contact predictions generated are not accurate at the residue level, there is quite a strong contact signal present when allowing for less than residue level precision. Our residue-residue contact predictor, DNcon, performed particularly well achieving an accuracy of 66% for the top L/10 long range contacts when evaluated in a neighbourhood of size 2. The coverage of residue-residue contact areas was also greater with DNcon when compared to other methods. We also provide an analysis of DNcon with respect to its underlying architecture and features used for classification.
Our novel evaluation metrics demonstrate that current residue-residue contact predictions do contain a strong contact signal and are of better quality than standard evaluation metrics indicate. Our method, DNcon, is a robust, state-of-the-art residue-residue sequence based contact predictor and excelled under a number of evaluation schemes. It is available as a web service at http://iris.rnet.missouri.edu/dncon/.
近年来,预测蛋白质残基残基接触的使用和重要性显著增加,其应用包括药物设计、蛋白质三级结构预测和模型质量评估等。然而,针对硬目标的基于序列的长程接触预测,报告的准确率仍徘徊在 25-35%。尽管社区长期以来一直致力于提高残基残基接触预测的性能,但情况仍然如此。需要对当前残基残基接触预测的质量和使用的评估指标以及当前方法进行全面研究,以激发接触预测及其应用的进一步发展。这样的研究将更好地解释当前方法生成的残基残基接触预测的质量和性质,并因此更好地利用这种接触信息。
我们评估了参加第十届蛋白质结构预测关键评估(CASP)实验的几种基于序列的残基残基接触预测器。评估使用了标准评估技术,如官方 CASP 评估员使用的技术以及两种新的评估指标(即聚类精度和聚类计数)。深入分析表明,虽然大多数残基残基接触预测在残基水平上不准确,但当允许精度低于残基水平时,存在相当强的接触信号。我们的残基残基接触预测器 DNcon 在评估大小为 2 的邻域时,对于前 L/10 个长程接触,其准确率达到 66%,表现尤为出色。与其他方法相比,DNcon 还覆盖了更多的残基残基接触区域。我们还提供了对 DNcon 的分析,包括其底层架构和用于分类的特征。
我们的新评估指标表明,当前的残基残基接触预测确实包含强烈的接触信号,并且比标准评估指标所表明的质量更好。我们的方法 DNcon 是一种强大的、最先进的基于残基序列的残基残基接触预测器,在许多评估方案下表现出色。它可以作为一个网络服务在 http://iris.rnet.missouri.edu/dncon/ 上获得。