Suppr超能文献

CSI 2.0:化学位移指数的显著改进版本。

CSI 2.0: a significantly improved version of the Chemical Shift Index.

作者信息

Hafsa Noor E, Wishart David S

机构信息

Department of Computing Science, University of Alberta, Edmonton, Canada.

出版信息

J Biomol NMR. 2014 Nov;60(2-3):131-46. doi: 10.1007/s10858-014-9863-x. Epub 2014 Oct 2.

Abstract

Protein chemical shifts have long been used by NMR spectroscopists to assist with secondary structure assignment and to provide useful distance and torsion angle constraint data for structure determination. One of the most widely used methods for secondary structure identification is called the Chemical Shift Index (CSI). The CSI method uses a simple digital chemical shift filter to locate secondary structures along the protein chain using backbone (13)C and (1)H chemical shifts. While the CSI method is simple to use and easy to implement, it is only about 75-80% accurate. Here we describe a significantly improved version of the CSI (2.0) that uses machine-learning techniques to combine all six backbone chemical shifts ((13)Cα, (13)Cβ, (13)C, (15)N, (1)HN, (1)Hα) with sequence-derived features to perform far more accurate secondary structure identification. Our tests indicate that CSI 2.0 achieved an average identification accuracy (Q3) of 90.56% for a training set of 181 proteins in a repeated tenfold cross-validation and 89.35% for a test set of 59 proteins. This represents a significant improvement over other state-of-the-art chemical shift-based methods. In particular, the level of performance of CSI 2.0 is equal to that of standard methods, such as DSSP and STRIDE, used to identify secondary structures via 3D coordinate data. This suggests that CSI 2.0 could be used both in providing accurate NMR constraint data in the early stages of protein structure determination as well as in defining secondary structure locations in the final protein model(s). A CSI 2.0 web server (http://csi.wishartlab.com) is available for submitting the input queries for secondary structure identification.

摘要

长期以来,核磁共振光谱学家一直使用蛋白质化学位移来辅助二级结构归属,并为结构测定提供有用的距离和扭转角约束数据。二级结构识别中使用最广泛的方法之一称为化学位移指数(CSI)。CSI方法使用简单的数字化学位移滤波器,通过主链(13)C和(1)H化学位移来定位蛋白质链上的二级结构。虽然CSI方法使用简单且易于实现,但其准确率仅约为75 - 80%。在此,我们描述了一种显著改进的CSI(2.0)版本,它使用机器学习技术将所有六个主链化学位移((13)Cα、(13)Cβ、(13)C、(15)N、(1)HN、(1)Hα)与序列衍生特征相结合,以进行更准确的二级结构识别。我们的测试表明,在重复的十折交叉验证中,对于181个蛋白质的训练集,CSI 2.0的平均识别准确率(Q3)为90.56%,对于59个蛋白质的测试集为89.35%。这相对于其他基于化学位移的先进方法有显著改进。特别是,CSI 2.0的性能水平与用于通过三维坐标数据识别二级结构的标准方法(如DSSP和STRIDE)相当。这表明CSI 2.0既可以在蛋白质结构测定的早期阶段提供准确的核磁共振约束数据,也可以在最终的蛋白质模型中定义二级结构位置。一个CSI 2.0网络服务器(http://csi.wishartlab.com)可用于提交二级结构识别的输入查询。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验