Nassif Houssam, Al-Ali Hassan, Khuri Sawsan, Keirouz Walid, Page David
Department of Computer Sciences, University of Wisconsin-Madison, USA.
Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA.
Inductive Log Program. 2010;5989:149-165. doi: 10.1007/978-3-642-13840-9_14.
Hexoses are simple sugars that play a key role in many cellular pathways, and in the regulation of development and disease mechanisms. Current protein-sugar computational models are based, at least partially, on prior biochemical findings and knowledge. They incorporate different parts of these findings in predictive black-box models. We investigate the empirical support for biochemical findings by comparing Inductive Logic Programming (ILP) induced rules to actual biochemical results. We mine the Protein Data Bank for a representative data set of hexose binding sites, non-hexose binding sites and surface grooves. We build an ILP model of hexose-binding sites and evaluate our results against several baseline machine learning classifiers. Our method achieves an accuracy similar to that of other black-box classifiers while providing insight into the discriminating process. In addition, it confirms wet-lab findings and reveals a previously unreported Trp-Glu amino acids dependency.
己糖是简单的糖类,在许多细胞途径以及发育和疾病机制的调节中发挥关键作用。当前的蛋白质-糖类计算模型至少部分基于先前的生化研究结果和知识。它们将这些研究结果的不同部分纳入预测性黑箱模型中。我们通过将归纳逻辑编程(ILP)诱导规则与实际生化结果进行比较,来研究对生化研究结果的实证支持。我们在蛋白质数据库中挖掘己糖结合位点、非己糖结合位点和表面凹槽的代表性数据集。我们构建了己糖结合位点的ILP模型,并针对几种基线机器学习分类器评估我们的结果。我们的方法在提供对鉴别过程的洞察的同时,实现了与其他黑箱分类器相似的准确率。此外,它证实了湿实验室的研究结果,并揭示了一种先前未报道的色氨酸-谷氨酸氨基酸依赖性。