基于蛋白质结构数据的螺旋预测的关键评估。

Critical assessment of coiled-coil predictions based on protein structure data.

机构信息

Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany.

Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Göttingen, Germany.

出版信息

Sci Rep. 2021 Jun 14;11(1):12439. doi: 10.1038/s41598-021-91886-w.

Abstract

Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools' performance is close to random. This implicates that the tools' predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence.

摘要

卷曲螺旋区域是最早从结构和理论上描述的蛋白质基序之一。该基序结构简单,这意味着卷曲螺旋区域可以在任何蛋白质序列中以合理的准确度和精密度进行检测。在这里,我们针对现有的最全面的参考数据集——整个蛋白质数据库,对每个氨基酸及其二级结构,重新评估了最常用的卷曲螺旋预测工具。除了预测的卷曲螺旋数量最小值和最大值之间相差 30 倍之外,这些工具在预测卷曲螺旋区域的位置上也存在很大差异。因此,存在大量的假阳性预测和错过的真正卷曲螺旋区域。与简单的抛硬币模型相比,二元分类度量的评估以及马氏相关系数(用于不平衡数据集的最可靠性能度量)的计算表明,测试工具的性能接近随机。这意味着工具的预测只有有限的信息价值。卷曲螺旋预测常用于解释生化数据,并作为计算功能基因组注释的一部分。我们的结果表明,这些预测应该非常谨慎地对待,并需要通过实验证据来支持和验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cd3/8203680/c334e02917a0/41598_2021_91886_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索