Suppr超能文献

SENSDeep:一种用于蛋白质-蛋白质相互作用位点预测的集成深度学习方法。

SENSDeep: An Ensemble Deep Learning Method for Protein-Protein Interaction Sites Prediction.

作者信息

Aybey Engin, Gümüş Özgür

机构信息

Department of Health Bioinformatics, Ege University, 35100, Bornova, Izmir, Turkey.

Rectorate, Marmara University, 34722, Kadıköy, Istanbul, Turkey.

出版信息

Interdiscip Sci. 2023 Mar;15(1):55-87. doi: 10.1007/s12539-022-00543-x. Epub 2022 Nov 8.

Abstract

PURPOSE

The determination of which amino acid in a protein interacts with other proteins is important in understanding the functional mechanism of that protein. Although there are experimental methods to detect protein-protein interaction sites (PPISs), these are costly, time-consuming, and require expertise. Therefore, many computational methods have been proposed to accelerate this type of research, but they are generally insufficient to predict PPISs accurately. There is a need for development in this field.

METHODS

In this study, we introduce a new PPISs prediction method. This method is a sequence-based Stacking ENSemble Deep (SENSDeep) learning method that has an ensemble learning model including the models of RNN, CNN, GRU sequence to sequence (GRUs2s), GRU sequence to sequence with an attention layer (GRUs2satt) and a multilayer perceptron. Two embedded features, secondary structure, and protein sequence information are added to the training data set in addition to twelve existing features to improve the prediction performance of the method.

RESULTS

SENSDeep trained on the training data set without two extra features obtains a better performance on some of the independent testing data sets than that of the other methods in the literature, especially on scoring metrics of sensitivity, F1, MCC, and AUPRC, having increments up to 63.5%, 19.3%, 18.5%, 11.4%, respectively. It is shown that the added extra features improve the performance of the method by having almost the same performance with less data as the method trained on the data set without these added features. On the other hand, different sizes of the sliding window are tried on the data sets and an optimal sliding window size for SENSDeep is found. Moreover, SENSDeep has also been compared to structure-based methods. Some of these methods have been found to perform better. Using SENSDeep obtained by training with both training data sets, PPISs prediction examples of various proteins that are not in these training data sets are also presented. Furthermore, execution times for SENSDeep and its submodels are shown.

AVAILABILITY AND IMPLEMENTATION

https://github.com/enginaybey/SENSDeep.

摘要

目的

确定蛋白质中的哪种氨基酸与其他蛋白质相互作用,对于理解该蛋白质的功能机制至关重要。尽管存在检测蛋白质 - 蛋白质相互作用位点(PPISs)的实验方法,但这些方法成本高昂、耗时且需要专业知识。因此,人们提出了许多计算方法来加速这类研究,但它们通常不足以准确预测PPISs。该领域仍有待发展。

方法

在本研究中,我们引入了一种新的PPISs预测方法。此方法是一种基于序列的堆叠集成深度(SENSDeep)学习方法,具有一个集成学习模型,该模型包括循环神经网络(RNN)、卷积神经网络(CNN)、门控循环单元序列到序列(GRUs2s)、带有注意力层的门控循环单元序列到序列(GRUs2satt)以及多层感知器的模型。除了十二个现有特征外,还将两个嵌入特征,即二级结构和蛋白质序列信息添加到训练数据集中,以提高该方法的预测性能。

结果

在没有两个额外特征的训练数据集上训练的SENSDeep,在一些独立测试数据集上的表现优于文献中的其他方法,特别是在灵敏度、F1、马修斯相关系数(MCC)和曲线下面积(AUPRC)等评分指标上,分别提高了63.5%、19.3%、18.5%、11.4%。结果表明,添加的额外特征提高了该方法的性能,在数据量几乎相同的情况下,其性能与在没有这些添加特征的数据集上训练的方法相近。另一方面,在数据集上尝试了不同大小的滑动窗口,并为SENSDeep找到了最佳滑动窗口大小。此外,还将SENSDeep与基于结构的方法进行了比较。发现其中一些方法表现更好。使用通过两个训练数据集训练得到的SENSDeep,还展示了不在这些训练数据集中的各种蛋白质的PPISs预测示例。此外,还给出了SENSDeep及其子模型的执行时间。

可用性和实现方式

https://github.com/enginaybey/SENSDeep。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验