Gormez Yasin, Aydin Zafer
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1104-1113. doi: 10.1109/TCBB.2022.3191395. Epub 2023 Apr 3.
Protein secondary structure, solvent accessibility and torsion angle predictions are preliminary steps to predict 3D structure of a protein. Deep learning approaches have achieved significant improvements in predicting various features of protein structure. In this study, IGPRED-Multitask, a deep learning model with multi task learning architecture based on deep inception network, graph convolutional network and a bidirectional long short-term memory is proposed. Moreover, hyper-parameters of the model are fine-tuned using Bayesian optimization, which is faster and more effective than grid search. The same benchmark test data sets as in the OPUS-TASS paper including TEST2016, TEST2018, CASP12, CASP13, CASPFM, HARD68, CAMEO93, CAMEO93_HARD, as well as the train and validation sets, are used for fair comparison with the literature. Statistically significant improvements are observed in secondary structure prediction on 4 datasets, in phi angle prediction on 2 datasets and in psi angel prediction on 3 datasets compared to the state-of-the-art methods. For solvent accessibility prediction, TEST2016 and TEST2018 datasets are used only to assess the performance of the proposed model.
蛋白质二级结构、溶剂可及性和扭转角预测是预测蛋白质三维结构的初步步骤。深度学习方法在预测蛋白质结构的各种特征方面取得了显著进展。在本研究中,提出了IGPRED-Multitask,这是一种基于深度卷积网络、图卷积网络和双向长短期记忆的具有多任务学习架构的深度学习模型。此外,使用贝叶斯优化对模型的超参数进行微调,这比网格搜索更快、更有效。与OPUS-TASS论文中相同的基准测试数据集,包括TEST2016、TEST2018、CASP12、CASP13、CASPFM、HARD68、CAMEO93、CAMEO93_HARD以及训练集和验证集,用于与文献进行公平比较。与现有方法相比,在4个数据集的二级结构预测、2个数据集的φ角预测和3个数据集的ψ角预测中观察到了具有统计学意义的改进。对于溶剂可及性预测,仅使用TEST2016和TEST2018数据集来评估所提出模型的性能。