School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China.
Department of Computer Science, Jiangnan University, No. 1800 Lihu Avenue, Wuxi, 214122, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab001.
As an essential task in protein structure and function prediction, protein fold recognition has attracted increasing attention. The majority of the existing machine learning-based protein fold recognition approaches strongly rely on handcrafted features, which depict the characteristics of different protein folds; however, effective feature extraction methods still represent the bottleneck for further performance improvement of protein fold recognition. As a powerful feature extractor, deep convolutional neural network (DCNN) can automatically extract discriminative features for fold recognition without human intervention, which has demonstrated an impressive performance on protein fold recognition. Despite the encouraging progress, DCNN often acts as a black box, and as such, it is challenging for users to understand what really happens in DCNN and why it works well for protein fold recognition. In this study, we explore the intrinsic mechanism of DCNN and explain why it works for protein fold recognition using a visual explanation technique. More specifically, we first trained a VGGNet-based DCNN model, termed VGGNet-FE, which can extract fold-specific features from the predicted protein residue-residue contact map for protein fold recognition. Subsequently, based on the trained VGGNet-FE, we implemented a new contact-assisted predictor, termed VGGfold, for protein fold recognition; we then visualized what features were extracted by each of the convolutional layers in VGGNet-FE using a deconvolution technique. Furthermore, we visualized the high-level semantic information, termed fold-discriminative region, of a predicted contact map from the localization map obtained from the last convolutional layer of VGGNet-FE. It is visually confirmed that VGGNet-FE could effectively extract distinct fold-discriminative regions for different types of protein folds, thereby accounting for the improved performance of VGGfold for protein fold recognition. In summary, this study is of great significance for both understanding the working principle of DCNNs in protein fold recognition and exploring the relationship between the predicted protein contact map and protein tertiary structure. This proposed visualization method is flexible and applicable to address other DCNN-based bioinformatics and computational biology questions. The online web server of VGGfold is freely available at http://csbio.njust.edu.cn/bioinf/vggfold/.
作为蛋白质结构和功能预测的基本任务,蛋白质折叠识别受到了越来越多的关注。现有的大多数基于机器学习的蛋白质折叠识别方法都强烈依赖于手工制作的特征,这些特征描绘了不同蛋白质折叠的特征;然而,有效的特征提取方法仍然是进一步提高蛋白质折叠识别性能的瓶颈。作为一种强大的特征提取器,深度卷积神经网络(DCNN)可以自动提取用于折叠识别的有区别的特征,而无需人工干预,这在蛋白质折叠识别方面表现出了令人印象深刻的性能。尽管取得了令人鼓舞的进展,但 DCNN 通常作为一个黑盒子,因此,用户很难理解 DCNN 内部到底发生了什么,以及为什么它对蛋白质折叠识别效果良好。在这项研究中,我们探索了 DCNN 的内在机制,并使用可视化解释技术解释了为什么它对蛋白质折叠识别有效。更具体地说,我们首先训练了一个基于 VGGNet 的 DCNN 模型,称为 VGGNet-FE,它可以从预测的蛋白质残基-残基接触图中提取折叠特异性特征,用于蛋白质折叠识别。随后,基于训练好的 VGGNet-FE,我们实现了一种新的接触辅助预测器,称为 VGGfold,用于蛋白质折叠识别;我们然后使用去卷积技术可视化 VGGNet-FE 中每个卷积层提取的特征。此外,我们可视化了从 VGGNet-FE 的最后一个卷积层获得的定位图中的高级语义信息,称为折叠区分区域。可以直观地确认,VGGNet-FE 可以有效地为不同类型的蛋白质折叠提取独特的折叠区分区域,从而解释了 VGGfold 对蛋白质折叠识别性能的提高。总之,这项研究对于理解 DCNN 在蛋白质折叠识别中的工作原理以及探索预测蛋白质接触图与蛋白质三级结构之间的关系具有重要意义。这种提出的可视化方法具有灵活性,可用于解决其他基于 DCNN 的生物信息学和计算生物学问题。VGGfold 的在线网络服务器可在 http://csbio.njust.edu.cn/bioinf/vggfold/ 免费获得。