Mao Wusong, Cong Peisheng, Wang Zhiheng, Lu Longjian, Zhu Zhongliang, Li Tonghua
Department of Chemistry, Tongji University, Shanghai, China.
PLoS One. 2013 Dec 23;8(12):e83532. doi: 10.1371/journal.pone.0083532. eCollection 2013.
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
形状字符串是一种结构序列,是蛋白质主链构象极其重要的结构表示形式。核磁共振化学位移与蛋白质局部结构有很强的相关性,并与计算方法结合用于预测蛋白质结构。在此,我们展示了一种新方法,即NMRDSP,它能够基于核磁共振化学位移和从序列数据获得的结构概况准确预测蛋白质形状字符串。NMRDSP使用六个化学位移(HA、H、N、CA、CB和C)和结构概况的八个元素作为特征,一个非冗余集(1003个条目)作为训练集,并使用条件随机场作为分类算法。对于一个独立测试集(203个条目),我们在S8(八状态准确率)上达到了75.8%的准确率,在S3(三状态准确率)上达到了87.8%的准确率。这高于仅使用化学位移或序列数据的情况,并证实化学位移和结构概况是形状字符串预测的重要特征,它们的结合显著提高了预测器的准确率。我们构建了NMRDSP网络服务器,并相信它可用于提供一个坚实的平台来预测其他蛋白质的结构和功能。NMRDSP网络服务器可在http://cal.tongji.edu.cn/NMRDSP/index.jsp免费获取。