Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
BMC Bioinformatics. 2012 May 31;13:118. doi: 10.1186/1471-2105-13-118.
Adenosine-5'-triphosphate (ATP) is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex.
In this paper, we propose a novel framework for predicting the proteins' functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target's knowledge and multiple sources of descriptors can be complementary.
Prediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.
三磷酸腺苷(ATP)是一种多功能核苷酸,作为与蛋白质相互作用的辅酶,在细胞生物学中发挥着重要作用。揭示蛋白质与 ATP 之间的结合位点对于理解蛋白质的功能和蛋白质-ATP 复合物的机制具有重要意义。
在本文中,我们提出了一种新的框架,用于预测与 ATP 分子结合的蛋白质功能残基。新的预测方案是通过序列进化信息和多视图序列特征的双谱采样以及序列衍生结构特征的组合来实现的。该策略的假设是单视图特征只能代表部分目标的知识,并且多个描述符来源可以互补。
通过对包含 168 个和 227 个非同源 ATP 结合蛋白的两个基准数据集进行的 5 折和留一法 jackknife 交叉验证测试的预测性能评估表明了该方案的有效性。我们的实验结果还表明,真实蛋白质-ATP 结合位点的残基结构特征与正常残基显著不同,例如结合残基不显示高溶剂可及性倾向,并且结合更倾向于发生在不同二级结构片段的连接点处。此外,通过在实验中测试多个正样本和负样本之间的比例,结果还表明性能受到不平衡训练数据集的影响。增加数据集规模也被证明有助于提高预测性能。