Signal Processing Laboratory , Griffith University , Brisbane , QLD 4122 , Australia.
Institute for Glycomics and School of Information and Communication Technology , Griffith University , Southport , QLD 4222 , Australia.
J Chem Inf Model. 2018 Sep 24;58(9):2033-2042. doi: 10.1021/acs.jcim.8b00442. Epub 2018 Aug 29.
It has been long established that cis conformations of amino acid residues play many biologically important roles despite their rare occurrence in protein structure. Because of this rarity, few methods have been developed for predicting cis isomers from protein sequences, most of which are based on outdated datasets and lack the means for independent testing. In this work, using a database of >10000 high-resolution protein structures, we update the statistics of cis isomers and develop a sequence-based prediction technique using an ensemble of residual convolutional and long short-term memory bidirectional recurrent neural networks that allow learning from the whole protein sequence. We show that ensembling eight neural network models yields maximum Matthews correlation coefficient values of approximately 0.35 for cis-Pro isomers and 0.1 for cis-nonPro residues. The method should be useful for prioritizing functionally important residues in cis isomers for experimental validations and improving the sampling of rare protein conformations for ab initio protein structure prediction.
已经证实,尽管氨基酸残基的顺式构象在蛋白质结构中很少出现,但它们在许多生物中发挥着重要作用。由于这种稀有性,很少有方法可以从蛋白质序列中预测顺式异构体,其中大多数方法都是基于过时的数据集,并且缺乏独立测试的手段。在这项工作中,我们使用了一个包含>10000 个高分辨率蛋白质结构的数据库,更新了顺式异构体的统计数据,并开发了一种基于残差卷积和长短期记忆双向递归神经网络的序列预测技术,该技术可以从整个蛋白质序列中进行学习。我们表明,集成八个神经网络模型可以为顺式-Pro 异构体产生大约 0.35 的马修斯相关系数值,为顺式-nonPro 残基产生 0.1 的马修斯相关系数值。该方法对于优先考虑顺式异构体中功能重要的残基进行实验验证,以及提高从头预测蛋白质结构中稀有蛋白质构象的采样率都将非常有用。