Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37203, United States.
Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37203, United States.
J Proteome Res. 2021 Aug 6;20(8):4089-4100. doi: 10.1021/acs.jproteome.1c00410. Epub 2021 Jul 8.
Prediction of residue-level structural attributes and protein-level structural classes helps model protein tertiary structures and understand protein functions. Existing methods are either specialized on only one class of proteins or developed to predict only a specific type of residue-level attribute. In this work, we develop a new deep-learning method, named Membrane Association and Secondary Structure Predictor (MASSP), for accurately predicting both residue-level structural attributes (secondary structure, location, orientation, and topology) and protein-level structural classes (bitopic, α-helical, β-barrel, and soluble). MASSP integrates a multilayer two-dimensional convolutional neural network (2D-CNN) with a long short-term memory (LSTM) neural network into a multitasking framework. Our comparison shows that MASSP performs equally well or better than the state-of-the-art methods in predicting residue-level secondary structures, boundaries of transmembrane segments, and topology. Furthermore, it achieves outstanding accuracy in predicting protein-level structural classes. MASSP automatically distinguishes the structural classes of input sequences and identifies transmembrane segments and topologies if present, making it broadly applicable to different classes of proteins. In summary, MASSP's good performance and broad applicability make it well suited for annotating residue-level attributes and protein-level structural classes at the proteome scale.
预测残基水平的结构属性和蛋白质水平的结构类别有助于构建蛋白质三级结构和理解蛋白质功能。现有的方法要么专门针对某一类蛋白质,要么专门用于预测特定类型的残基水平属性。在这项工作中,我们开发了一种新的深度学习方法,称为膜结合和二级结构预测器(MASSP),用于准确预测残基水平的结构属性(二级结构、位置、取向和拓扑)和蛋白质水平的结构类别(双位、α-螺旋、β-桶和可溶性)。MASSP 将多层二维卷积神经网络(2D-CNN)与长短期记忆(LSTM)神经网络集成到一个多任务框架中。我们的比较表明,MASSP 在预测残基水平的二级结构、跨膜片段的边界和拓扑方面的表现与最先进的方法相当或更好。此外,它在预测蛋白质水平的结构类别方面也取得了出色的准确性。MASSP 自动区分输入序列的结构类别,并识别存在的跨膜片段和拓扑结构,使其广泛适用于不同类别的蛋白质。总之,MASSP 的良好性能和广泛适用性使其非常适合在蛋白质组范围内注释残基水平的属性和蛋白质水平的结构类别。