Zhao Huiying, Yang Yuedong, von Itzstein Mark, Zhou Yaoqi
Indiana University School of Informatics, Indiana University Purdue University, Indianapolis, 719 Indiana Ave, Suite 319, Indianapolis, Indiana, 46202; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, 46202.
J Comput Chem. 2014 Nov 15;35(30):2177-83. doi: 10.1002/jcc.23730. Epub 2014 Sep 15.
Carbohydrate-binding proteins (CBPs) are potential biomarkers and drug targets. However, the interactions between carbohydrates and proteins are challenging to study experimentally and computationally because of their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates as exists in RNA, DNA, and proteins. Here, we describe a structure-based function-prediction technique called SPOT-Struc that identifies carbohydrate-recognizing proteins and their binding amino acid residues by structural alignment program SPalign and binding affinity scoring according to a knowledge-based statistical potential based on the distance-scaled finite-ideal gas reference state (DFIRE). The leave-one-out cross-validation of the method on 113 carbohydrate-binding domains and 3442 noncarbohydrate binding proteins yields a Matthews correlation coefficient of 0.56 for SPalign alone and 0.63 for SPOT-Struc (SPalign + binding affinity scoring) for CBP prediction. SPOT-Struc is a technique with high positive predictive value (79% correct predictions in all positive CBP predictions) with a reasonable sensitivity (52% positive predictions in all CBPs). The sensitivity of the method was changed slightly when applied to 31 APO (unbound) structures found in the protein databank (14/31 for APO versus 15/31 for HOLO). The result of SPOT-Struc will not change significantly if highly homologous templates were used. SPOT-Struc predicted 19 out of 2076 structural genome targets as CBPs. In particular, one uncharacterized protein in Bacillus subtilis (1oq1A) was matched to galectin-9 from Mus musculus. Thus, SPOT-Struc is useful for uncovering novel carbohydrate-binding proteins. SPOT-Struc is available at http://sparks-lab.org.
碳水化合物结合蛋白(CBPs)是潜在的生物标志物和药物靶点。然而,由于碳水化合物与蛋白质之间的结合亲和力低、灵活性高,且碳水化合物不像RNA、DNA和蛋白质那样具有线性序列,因此对它们之间的相互作用进行实验研究和计算研究都具有挑战性。在此,我们描述了一种基于结构的功能预测技术,称为SPOT-Struc,该技术通过结构比对程序SPalign以及根据基于距离缩放有限理想气体参考状态(DFIRE)的基于知识的统计势进行结合亲和力评分,来识别碳水化合物识别蛋白及其结合氨基酸残基。该方法在113个碳水化合物结合结构域和3442个非碳水化合物结合蛋白上进行留一法交叉验证,对于单独的SPalign,马修斯相关系数为0.56,对于用于CBP预测的SPOT-Struc(SPalign + 结合亲和力评分),马修斯相关系数为0.63。SPOT-Struc是一种具有高阳性预测值的技术(在所有阳性CBP预测中79%预测正确),同时具有合理的灵敏度(在所有CBPs中52%阳性预测)。当应用于蛋白质数据库中发现的31个APO(未结合)结构时,该方法的灵敏度略有变化(APO为14/31,而HOLO为15/31)。如果使用高度同源的模板,SPOT-Struc的结果不会有显著变化。SPOT-Struc在2076个结构基因组靶点中预测出19个为CBPs。特别地,枯草芽孢杆菌中的一种未表征蛋白(1oq1A)与小家鼠的半乳糖凝集素-9相匹配。因此,SPOT-Struc对于发现新型碳水化合物结合蛋白很有用。可通过http://sparks-lab.org获取SPOT-Struc。