School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China.
Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, Victoria, 3800, Australia.
Anal Biochem. 2022 Aug 15;651:114695. doi: 10.1016/j.ab.2022.114695. Epub 2022 Apr 26.
Protein fold recognition is a critical step in protein structure and function prediction, and aims to ascertain the most likely fold type of the query protein. As a typical pattern recognition problem, designing a powerful feature extractor and metric function to extract relevant and representative fold-specific features from protein sequences is the key to improving protein fold recognition. In this study, we propose an effective sequence-based approach, called RattnetFold, to identify protein fold types. The basic concept of RattnetFold is to employ a stack convolutional neural network with the attention mechanism that acts as a feature extractor to extract fold-specific features from protein residue-residue contact maps. Moreover, based on the fold-specific features, we leverage metric learning to project fold-specific features into a subspace where similar proteins are closer together and name this approach RattnetFoldPro. Benchmarking experiments illustrate that RattnetFold and RattnetFoldPro enable the convolutional neural networks to efficiently learn the underlying subtle patterns in residue-residue contact maps, thereby improving the performance of protein fold recognition. An online web server of RattnetFold and the benchmark datasets are freely available at http://csbio.njust.edu.cn/bioinf/rattnetfold/.
蛋白质结构预测是生物学和生物医学领域的一个重要研究方向,对于理解蛋白质的功能、药物设计和疾病诊断等具有重要意义。在蛋白质结构预测中,蛋白质折叠类型的识别是一个关键的步骤。本文提出了一种基于卷积神经网络的方法,称为 RattnetFold,用于识别蛋白质折叠类型。该方法利用卷积神经网络从蛋白质残基接触图中提取折叠特异性特征,并利用度量学习将这些特征投影到一个子空间中,使得相似的蛋白质更加接近。基准实验表明,RattnetFold 能够有效地学习残基接触图中的潜在模式,从而提高蛋白质折叠类型的识别性能。本文还提供了一个在线服务器,供用户使用 RattnetFold 进行蛋白质折叠类型的预测。