Kundu Kousik, Backofen Rolf
Department of Human Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Department of Haematology, University of Cambridge, Cambridge, UK.
Methods Mol Biol. 2017;1555:83-97. doi: 10.1007/978-1-4939-6762-9_6.
Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2-peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2-peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2-peptide interactions.
Src同源2(SH2)结构域是模块化蛋白质结构域的一个重要亚类,在真核生物的多个生物学过程中发挥着不可或缺的作用。SH2结构域特异性结合其结合肽的磷酸酪氨酸残基,以促进各种分子功能。为了确定SH2结构域的细微结合特异性,了解这些结构域在复杂细胞环境中识别其靶肽的有趣机制非常重要。已经有几次尝试使用高通量数据来预测SH2-肽相互作用。然而,这些高通量数据往往受到低信噪比的影响。此外,预测方法还有几个其他缺点,如线性问题、高计算复杂度等。因此,使用高通量数据进行SH2-肽相互作用的计算识别仍然具有挑战性。在这里,我们提出了一种基于高效半监督学习技术的机器学习方法,用于预测人类蛋白质组中51种SH2结构域介导的相互作用。在我们的研究中,我们成功地采用了几种策略来解决SH2-肽相互作用计算识别中的主要问题。