Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India.
Sci Rep. 2020 Nov 5;10(1):19171. doi: 10.1038/s41598-020-75467-x.
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.
蛋白质是生物的主要组成部分。它与其他蛋白质相互作用,然后参与各种生物过程。蛋白质-蛋白质相互作用 (PPIs) 有助于预测,从而有助于理解蛋白质的功能、疾病的原因和发展以及设计新药。然而,可用的蛋白质序列和蛋白质-蛋白质相互作用的识别之间存在巨大差距。为了弥合这一差距,研究人员提出了几种计算方法来揭示蛋白质之间的相互作用。这些方法仅依赖于蛋白质的基于序列的信息。随着技术的进步,与蛋白质相关的不同类型的信息可用,例如 3D 结构信息。如今,深度学习技术已成功应用于各个领域,包括生物信息学。因此,当前的工作重点是利用不同的模态,例如蛋白质的 3D 结构和基于序列的信息,以及深度学习算法来预测 PPIs。所提出的方法分为几个阶段。我们首先使用蛋白质的 3D 坐标信息获取蛋白质的多个图像,并使用三个属性,如疏水性指数、等电点和氨基酸电荷。氨基酸是蛋白质的组成部分。使用预训练的 ResNet50 模型,即卷积神经网络的一个子类,从这些蛋白质表示中提取特征。自协方差和共联体是两种广泛使用的基于序列的方法,用于对蛋白质进行编码,这里将其用作蛋白质序列的另一种模态。堆叠自动编码器用于获取基于序列的信息的紧凑形式。最后,从不同模态获得的特征成对串联,并输入分类器以预测蛋白质对的标签。我们在人类 PPIs 数据集和 Saccharomyces cerevisiae PPIs 数据集上进行了实验,并将我们的结果与基于深度学习的最新分类器进行了比较。所提出方法的结果优于现有方法的结果。在不同数据集上的广泛实验表明,我们从两种不同模态学习和组合特征的方法对于 PPI 预测很有用。