Department of Chemistry, Vanderbilt University, Nashville, Tennessee; Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA.
Proteins. 2013 Jul;81(7):1127-40. doi: 10.1002/prot.24258. Epub 2013 Apr 10.
Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α-helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three-state secondary structure prediction, and 94.8% for three-state transmembrane span prediction. These accuracies are comparable to state-of-the-art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org.
从蛋白质序列预测跨膜跨度和二级结构通常是(膜)蛋白质结构特征描述的第一步。氨基酸在蛋白质中形成二级结构和位于膜中的倾向是相关的。然而,目前的方法要么预测二级结构,要么预测单个跨膜状态。我们引入了一种从蛋白质序列同时预测二级结构和跨膜跨度的方法。这种方法不仅消除了从几个预测器可能相互矛盾的输出中创建共识预测的必要性,而且还有可能预测构象开关,即序列区域具有很高的可能性从溶液中的线圈构象转变为例如α-螺旋跨膜状态。人工神经网络在 177 个膜蛋白和 6048 个可溶性蛋白的数据库上进行了训练。输出是序列中每个残基的 3×3 维概率矩阵,该矩阵结合了三种二级结构类型(螺旋、链、线圈)和三种环境类型(膜核心、界面、溶液)。对于九个可能的状态,预测准确率为 70.3%,对于三状态二级结构预测,准确率为 73.2%,对于三状态跨膜跨度预测,准确率为 94.8%。这些准确性可与二级结构的最新预测器(例如 Psipred)或跨膜位置预测器(例如 OCTOPUS)相媲美。该方法可作为网络服务器使用,也可在 www.meilerlab.org 上下载。