IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):593-604. doi: 10.1109/TCBB.2020.3005972. Epub 2022 Feb 3.
Since protein 3D structure prediction is very important for biochemical study and drug design, researchers have developed many machine learning algorithms to predict protein 3D structures using the sequence information only. Understanding the sequence-to-structure relationship is key for the successful structure prediction. Previous approaches including the single shallow learning model, the single deep learning model and clustering algorithms all have disadvantages to understand precise sequence-to-structure relationship. In order to further improve the performance of the local protein structure prediction, a novel deep learning model called Clustering Recurrent Neural Network (CRNN) is proposed. In this model, the whole protein dataset is divided into multiple cluster subtrees. A RNN is trained for each cluster in the subtrees so that each RNN can be used to learn the computationally simpler local sequence-to-structure relationship instead of attempting to capture the global sequence-to-structure relationship. After learning the local sequence-to-structure relationship using RNN, CRNN is designed to predict distance matrices, torsion angles and secondary structures for backbone α-carbon atoms of protein sequence segments. Our experimental analysis indicates that 3D structure prediction accuracy is comparable or better than other state-of-art approaches.
由于蛋白质三维结构预测对于生化研究和药物设计非常重要,研究人员已经开发了许多机器学习算法,仅使用序列信息来预测蛋白质的三维结构。理解序列到结构的关系是成功进行结构预测的关键。以前的方法,包括单一浅层学习模型、单一深度学习模型和聚类算法,在理解精确的序列到结构关系方面都存在不足。为了进一步提高局部蛋白质结构预测的性能,提出了一种名为聚类递归神经网络(CRNN)的新型深度学习模型。在该模型中,将整个蛋白质数据集划分为多个聚类子树。在子树中的每个聚类上训练一个 RNN,以便每个 RNN 都可以用于学习计算上更简单的局部序列到结构关系,而不是尝试捕获全局序列到结构关系。使用 RNN 学习局部序列到结构关系后,CRNN 被设计用于预测蛋白质序列片段的骨干α碳原子的距离矩阵、扭转角和二级结构。我们的实验分析表明,三维结构预测的准确性可与其他最先进的方法相媲美或更好。