Dipartimento di Ingegneria e Scienza dell'Informazione, University of Trento, Trento, Italy.
BMC Bioinformatics. 2014 Apr 12;15:103. doi: 10.1186/1471-2105-15-103.
Protein-protein interactions can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowledge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the binding process and more efficient drug/enzyme design. Alas, most current interaction prediction methods do not identify which parts of a protein actually instantiate an interaction. Furthermore, they also fail to leverage the hierarchical nature of the problem, ignoring otherwise useful information available at the lower levels; when they do, they do not generate predictions that are guaranteed to be consistent between levels.
Inspired by earlier ideas of Yip et al. (BMC Bioinformatics 10:241, 2009), in the present paper we view the problem as a multi-level learning task, with one task per level (proteins, domains and residues), and propose a machine learning method that collectively infers the binding state of all object pairs. Our method is based on Semantic Based Regularization (SBR), a flexible and theoretically sound machine learning framework that uses First Order Logic constraints to tie the learning tasks together. We introduce a set of biologically motivated rules that enforce consistent predictions between the hierarchy levels.
We study the empirical performance of our method using a standard validation procedure, and compare its performance against the only other existing multi-level prediction technique. We present results showing that our method substantially outperforms the competitor in several experimental settings, indicating that exploiting the hierarchical nature of the problem can lead to better predictions. In addition, our method is also guaranteed to produce interactions that are consistent with respect to the protein-domain-residue hierarchy.
蛋白质-蛋白质相互作用可以被视为一个发生在三个相关层次的分层过程:蛋白质通过特定的结构域结合,这些结构域反过来又通过残基的斑块形成界面。关于哪些结构域和残基参与特定相互作用的详细知识对生物学有广泛的应用,包括更好地理解结合过程和更有效地设计药物/酶。然而,大多数当前的相互作用预测方法无法识别蛋白质的哪些部分实际上实例化了相互作用。此外,它们也未能利用问题的分层性质,忽略了在较低层次上可用的其他有用信息;当它们这样做时,它们不会生成在层次之间保证一致的预测。
受 Yip 等人早期思想的启发(BMC Bioinformatics 10:241, 2009),在本文中,我们将问题视为一个多层次学习任务,每个层次都有一个任务(蛋白质、结构域和残基),并提出了一种机器学习方法,该方法可以集体推断所有对象对的结合状态。我们的方法基于基于语义的正则化(SBR),这是一种灵活且理论上合理的机器学习框架,它使用一阶逻辑约束将学习任务联系在一起。我们引入了一组基于生物学的规则,以强制在层次结构级别之间进行一致的预测。
我们使用标准验证程序研究了我们方法的经验性能,并将其性能与唯一其他现有的多层次预测技术进行了比较。我们展示了结果,表明我们的方法在几个实验设置中都大大优于竞争对手,这表明利用问题的分层性质可以导致更好的预测。此外,我们的方法还保证生成与蛋白质-结构域-残基层次结构一致的相互作用。