School of Information Science and Engineering, Yunnan University, Kunming, 650091, China.
Research Institute of Resource Insects, Chinese Academy of Forestry, Kunming, 650224, China.
BMC Bioinformatics. 2019 Jun 17;20(1):341. doi: 10.1186/s12859-019-2940-0.
Protein secondary structure (PSS) is critical to further predict the tertiary structure, understand protein function and design drugs. However, experimental techniques of PSS are time consuming and expensive, and thus it's very urgent to develop efficient computational approaches for predicting PSS based on sequence information alone. Moreover, the feature matrix of a protein contains two dimensions: the amino-acid residue dimension and the feature vector dimension. Existing deep learning based methods have achieved remarkable performances of PSS prediction, but the methods often utilize the features from the amino-acid dimension. Thus, there is still room to improve computational methods of PSS prediction.
We propose a novel deep neural network method, called DeepACLSTM, to predict 8-category PSS from protein sequence features and profile features. Our method efficiently applies asymmetric convolutional neural networks (ACNNs) combined with bidirectional long short-term memory (BLSTM) neural networks to predict PSS, leveraging the feature vector dimension of the protein feature matrix. In DeepACLSTM, the ACNNs extract the complex local contexts of amino-acids; the BLSTM neural networks capture the long-distance interdependencies between amino-acids. Furthermore, the prediction module predicts the category of each amino-acid residue based on both local contexts and long-distance interdependencies. To evaluate performances of DeepACLSTM, we conduct experiments on three publicly available datasets: CB513, CASP10 and CASP12. Results indicate that the performance of our method is superior to the state-of-the-art baselines on three publicly datasets.
Experiments demonstrate that DeepACLSTM is an efficient predication method for predicting 8-category PSS and has the ability to extract more complex sequence-structure relationships between amino-acid residues. Moreover, experiments also indicate the feature vector dimension contains the useful information for improving PSS prediction.
蛋白质二级结构(PSS)对于进一步预测三级结构、理解蛋白质功能和设计药物至关重要。然而,PSS 的实验技术既耗时又昂贵,因此非常需要开发仅基于序列信息预测 PSS 的高效计算方法。此外,蛋白质的特征矩阵包含两个维度:氨基酸残基维度和特征向量维度。现有的基于深度学习的方法在 PSS 预测方面取得了显著的性能,但这些方法通常利用氨基酸维度的特征。因此,仍有改进 PSS 预测计算方法的空间。
我们提出了一种新的深度神经网络方法,称为 DeepACLSTM,用于从蛋白质序列特征和轮廓特征预测 8 类 PSS。我们的方法有效地应用了不对称卷积神经网络(ACNN)与双向长短期记忆(BLSTM)神经网络相结合,利用蛋白质特征矩阵的特征向量维度来预测 PSS。在 DeepACLSTM 中,ACNN 提取氨基酸的复杂局部上下文;BLSTM 神经网络捕获氨基酸之间的远距离相关性。此外,预测模块基于局部上下文和远距离相关性预测每个氨基酸残基的类别。为了评估 DeepACLSTM 的性能,我们在三个公开可用的数据集上进行了实验:CB513、CASP10 和 CASP12。结果表明,我们的方法在三个公开数据集上的性能优于最先进的基线。
实验表明,DeepACLSTM 是一种预测 8 类 PSS 的有效方法,并且具有提取氨基酸残基之间更复杂的序列结构关系的能力。此外,实验还表明特征向量维度包含用于改进 PSS 预测的有用信息。