Suppr超能文献

一种应用于溶剂可及性预测的可靠性评分分配通用方法。

A generic method for assignment of reliability scores applied to solvent accessibility predictions.

作者信息

Petersen Bent, Petersen Thomas Nordahl, Andersen Pernille, Nielsen Morten, Lundegaard Claus

机构信息

Center for Biological Sequence Analysis-CBS, Department of Systems Biology, Kemitorvet 208, Technical University of Denmark-DTU, Lyngby, Denmark.

出版信息

BMC Struct Biol. 2009 Jul 31;9:51. doi: 10.1186/1472-6807-9-51.

Abstract

BACKGROUND

Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score.

RESULTS

An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output.

CONCLUSION

The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset.

摘要

背景

估计特定实值预测的可靠性并非易事,其有效性往往存疑。了解是否能信任给定的预测很重要,因此最佳方法会将预测与可靠性得分或指标相关联。对于离散定性预测,可靠性通常按选定类别的输出分数之差来估计。对于将生物特征预测为单个实值而非分类的方法,这种方法不可行。作为应对这一挑战的解决方案,我们实现了一种方法,该方法预测氨基酸的相对表面可及性,并同时以Z分数的形式预测每个预测的可靠性。

结果

在一组通过实验解析的蛋白质结构上训练了一个人工神经网络集成,以预测氨基酸的相对暴露情况。该方法在训练过程中为每个表面可及性预测分配一个可靠性得分,这与最常用的通过对输出进行后处理来获得可靠性的程序不同。

结论

在一组常用序列(称为CB513集)上评估了神经网络的性能。获得的总体皮尔逊相关系数为0.72,与当前最佳公开可用方法Real-SPINE的性能相当。两种方法都将可靠性得分与单个预测相关联。然而,我们以Z分数形式实现的可靠性得分在区分从完全埋藏到完全暴露的氨基酸的整个范围内的好预测和坏预测方面,被证明是更具信息量的度量。当比较根据可靠性排序的前20%预测的皮尔逊相关系数时,这一点很明显。对于该子集,使用我们的方法和比较方法分别获得的值为0.79和0.74。对于任何选定的子集,这种趋势都是成立的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c67b/2725087/1ad70b34f614/1472-6807-9-51-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验