Suppr超能文献

蛋白质穿线模型质量度量的研究。

A study of quality measures for protein threading models.

作者信息

Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A

机构信息

Cell and Molecular Biology Department, Box 596. BMC Uppsala University, SE-751 24 Uppsala, Sweden.

出版信息

BMC Bioinformatics. 2001;2:5. doi: 10.1186/1471-2105-2-5. Epub 2001 Aug 1.

Abstract

BACKGROUND

Prediction of protein structures is one of the fundamental challenges in biology today. To fully understand how well different prediction methods perform, it is necessary to use measures that evaluate their performance. Every two years, starting in 1994, the CASP (Critical Assessment of protein Structure Prediction) process has been organized to evaluate the ability of different predictors to blindly predict the structure of proteins. To capture different features of the models, several measures have been developed during the CASP processes. However, these measures have not been examined in detail before. In an attempt to develop fully automatic measures that can be used in CASP, as well as in other type of benchmarking experiments, we have compared twenty-one measures. These measures include the measures used in CASP3 and CASP2 as well as have measures introduced later. We have studied their ability to distinguish between the better and worse models submitted to CASP3 and the correlation between them.

RESULTS

Using a small set of 1340 models for 23 different targets we show that most methods correlate with each other. Most pairs of measures show a correlation coefficient of about 0.5. The correlation is slightly higher for measures of similar types. We found that a significant problem when developing automatic measures is how to deal with proteins of different length. Also the comparisons between different measures is complicated as many measures are dependent on the size of the target. We show that the manual assessment can be reproduced to about 70% using automatic measures. Alignment independent measures, detects slightly more of the models with the correct fold, while alignment dependent measures agree better when selecting the best models for each target. Finally we show that using automatic measures would, to a large extent, reproduce the assessors ranking of the predictors at CASP3.

CONCLUSIONS

We show that given a sufficient number of targets the manual and automatic measures would have given almost identical results at CASP3. If the intent is to reproduce the type of scoring done by the manual assessor in in CASP3, the best approach might be to use a combination of alignment independent and alignment dependent measures, as used in several recent studies.

摘要

背景

蛋白质结构预测是当今生物学领域的一项基本挑战。为了全面了解不同预测方法的性能表现,有必要使用评估其性能的指标。自1994年起,每两年组织一次蛋白质结构预测关键评估(CASP)活动,以评估不同预测程序盲目预测蛋白质结构的能力。在CASP活动期间,为了捕捉模型的不同特征,开发了多种指标。然而,这些指标此前尚未得到详细研究。为了开发可用于CASP以及其他类型基准测试实验的全自动指标,我们对21种指标进行了比较。这些指标包括CASP3和CASP2中使用的指标以及后来引入的指标。我们研究了它们区分提交给CASP3的优劣模型的能力以及它们之间的相关性。

结果

使用针对23个不同目标的1340个模型组成的小数据集,我们发现大多数方法相互之间存在相关性。大多数指标对的相关系数约为0.5。相似类型的指标之间的相关性略高。我们发现,在开发自动指标时,一个重大问题是如何处理不同长度的蛋白质。此外,由于许多指标依赖于目标的大小,不同指标之间的比较也很复杂。我们表明,使用自动指标可以将人工评估的结果重现约70%。与比对无关的指标能略微多检测出一些具有正确折叠的模型,而在为每个目标选择最佳模型时,与比对相关的指标一致性更好。最后,我们表明使用自动指标在很大程度上可以重现CASP3中评估人员对预测程序的排名。

结论

我们表明,在有足够数量目标的情况下,人工和自动指标在CASP3中会得出几乎相同的结果。如果目的是重现CASP3中人工评估者的评分类型,最佳方法可能是像最近几项研究所做的那样结合使用与比对无关和与比对相关的指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2918/55330/b63ce5c78203/1471-2105-2-5-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验