Suppr超能文献

通过与二维卷积神经网络集成的循环神经网络改进蛋白质二级结构预测。

Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks.

作者信息

Guo Yanbu, Wang Bingyi, Li Weihua, Yang Bei

机构信息

* School of Information Science and Engineering, Yunnan University, No. 2 North Cuihu Road, Kunming 650091, P. R. China.

† The Research Institute of Resource Insects, Chinese Academy of Forestry, Bailongsi, Kunming 650224, P. R. China.

出版信息

J Bioinform Comput Biol. 2018 Oct;16(5):1850021. doi: 10.1142/S021972001850021X.

Abstract

Protein secondary structure prediction (PSSP) is an important research field in bioinformatics. The representation of protein sequence features could be treated as a matrix, which includes the amino-acid residue (time-step) dimension and the feature vector dimension. Common approaches to predict secondary structures only focus on the amino-acid residue dimension. However, the feature vector dimension may also contain useful information for PSSP. To integrate the information on both dimensions of the matrix, we propose a hybrid deep learning framework, two-dimensional convolutional bidirectional recurrent neural network (2C-BRNN), for improving the accuracy of 8-class secondary structure prediction. The proposed hybrid framework is to extract the discriminative local interactions between amino-acid residues by two-dimensional convolutional neural networks (2DCNNs), and then further capture long-range interactions between amino-acid residues by bidirectional gated recurrent units (BGRUs) or bidirectional long short-term memory (BLSTM). Specifically, our proposed 2C-BRNNs framework consists of four models: 2DConv-BGRUs, 2DCNN-BGRUs, 2DConv-BLSTM and 2DCNN-BLSTM. Among these four models, the 2DConv- models only contain two-dimensional (2D) convolution operations. Moreover, the 2DCNN- models contain 2D convolutional and pooling operations. Experiments are conducted on four public datasets. The experimental results show that our proposed 2DConv-BLSTM model performs significantly better than the benchmark models. Furthermore, the experiments also demonstrate that the proposed models can extract more meaningful features from the matrix of proteins, and the feature vector dimension is also useful for PSSP. The codes and datasets of our proposed methods are available at https://github.com/guoyanb/JBCB2018/ .

摘要

蛋白质二级结构预测(PSSP)是生物信息学中的一个重要研究领域。蛋白质序列特征的表示可以看作是一个矩阵,它包括氨基酸残基(时间步)维度和特征向量维度。预测二级结构的常用方法只关注氨基酸残基维度。然而,特征向量维度对于蛋白质二级结构预测也可能包含有用信息。为了整合矩阵两个维度上的信息,我们提出了一种混合深度学习框架,即二维卷积双向递归神经网络(2C-BRNN),以提高八类二级结构预测的准确性。所提出的混合框架是通过二维卷积神经网络(2DCNN)提取氨基酸残基之间有判别力的局部相互作用,然后通过双向门控循环单元(BGRU)或双向长短期记忆(BLSTM)进一步捕捉氨基酸残基之间的长程相互作用。具体来说,我们提出的2C-BRNN框架由四个模型组成:2DConv-BGRU、2DCNN-BGRU、2DConv-BLSTM和2DCNN-BLSTM。在这四个模型中,2DConv-模型只包含二维(2D)卷积操作。此外,2DCNN-模型包含2D卷积和池化操作。在四个公共数据集上进行了实验。实验结果表明,我们提出的2DConv-BLSTM模型的性能明显优于基准模型。此外,实验还表明,所提出的模型可以从蛋白质矩阵中提取更有意义的特征,并且特征向量维度对于蛋白质二级结构预测也是有用的。我们提出的方法的代码和数据集可在https://github.com/guoyanb/JBCB2018/ 上获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验