IEEE J Biomed Health Inform. 2021 May;25(5):1832-1838. doi: 10.1109/JBHI.2020.3022806. Epub 2021 May 11.
Protein is an essential macro-nutrient for perceiving a wide range of biochemical activities and biological regulations in living cells. In this work, we have presented a novel multi-modal approach, named MultiPredGO, for predicting protein functions by utilizing two different kinds of information, namely protein sequence and the protein secondary structure. Here, our contributions are threefold; firstly, along with the protein sequence, we learn the feature representation from the protein structure. Secondly, we develop two different deep learning models after considering the characteristics of the underlying data patterns of the protein sequence and protein 3D structures. Finally, along with these two modalities, we have also utilized protein interaction information for expediting the efficiency of the proposed model in predicting the protein functions. For extracting features from different modalities, we have utilized various variations of the convolutional neural network. As the protein function classes are dependent on each other, we have used a neuro-symbolic hierarchical classification model, which resembles the structure of Gene Ontology (GO), for effectively predicting the dependent protein functions. Finally, to validate the goodness of our proposed method (MultiPredGO), we have compared our results with various uni-modal along with two well-known multi-modal protein function prediction approaches, namely, INGA and DeepGO. Results show that the overall performance of the proposed approach in terms of accuracy, F-measure, precision, and recall metrics are better than those by the state-of-the-art methods. MultiPredGO attains an average 13.05% and 30.87% improvements over the best existing comparing approach (DeepGO) for cellular component and molecular functions, respectively.
蛋白质是感知活细胞中广泛生化活动和生物调控的必需宏量营养素。在这项工作中,我们提出了一种新的多模态方法 MultiPredGO,通过利用两种不同的信息,即蛋白质序列和蛋白质二级结构,来预测蛋白质功能。我们的贡献有三点:首先,除了蛋白质序列外,我们还从蛋白质结构中学习特征表示。其次,考虑到蛋白质序列和蛋白质 3D 结构的底层数据模式的特点,我们开发了两种不同的深度学习模型。最后,除了这两种模态之外,我们还利用蛋白质相互作用信息来提高所提出模型预测蛋白质功能的效率。为了从不同模态中提取特征,我们利用了卷积神经网络的各种变体。由于蛋白质功能类别相互依赖,我们使用了类似于基因本体论(GO)结构的神经符号层次分类模型,有效地预测了依赖的蛋白质功能。最后,为了验证我们提出的方法(MultiPredGO)的有效性,我们将结果与各种单模态以及两种著名的多模态蛋白质功能预测方法 INGA 和 DeepGO 进行了比较。结果表明,与最先进的方法相比,该方法在准确性、F 度量、精度和召回率方面的整体性能更好。MultiPredGO 在细胞成分和分子功能方面分别比最佳现有比较方法(DeepGO)平均提高了 13.05%和 30.87%。