Bhardwaj Nitin, Stahelin Robert V, Langlois Robert E, Cho Wonhwa, Lu Hui
Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA.
J Mol Biol. 2006 Jun 2;359(2):486-95. doi: 10.1016/j.jmb.2006.03.039. Epub 2006 Mar 30.
Membrane-binding peripheral proteins play important roles in many biological processes, including cell signaling and membrane trafficking. Unlike integral membrane proteins, these proteins bind the membrane mostly in a reversible manner. Since peripheral proteins do not have canonical transmembrane segments, it is difficult to identify them from their amino acid sequences. As a first step toward genome-scale identification of membrane-binding peripheral proteins, we built a kernel-based machine learning protocol. Key features of known membrane-binding proteins, including electrostatic properties and amino acid composition, were calculated from their amino acid sequences and tertiary structures, which were then incorporated into the support vector machine to perform the classification. A data set of 40 membrane-binding proteins and 230 non-membrane-binding proteins was used to construct and validate the protocol. Cross-validation and holdout evaluation of the protocol showed that the accuracy of the prediction reached up to 93.7% and 91.6%, respectively. The protocol was applied to the prediction of membrane-binding properties of four C2 domains from novel protein kinases C. Although these C2 domains have 50% sequence identity, only one of them was predicted to bind the membrane, which was verified experimentally with surface plasmon resonance analysis. These results suggest that our protocol can be used for predicting membrane-binding properties of a wide variety of modular domains and may be further extended to genome-scale identification of membrane-binding peripheral proteins.
膜结合外周蛋白在许多生物学过程中发挥着重要作用,包括细胞信号传导和膜运输。与整合膜蛋白不同,这些蛋白大多以可逆方式结合膜。由于外周蛋白没有典型的跨膜区段,因此很难从其氨基酸序列中识别它们。作为在基因组规模上鉴定膜结合外周蛋白的第一步,我们构建了一种基于核的机器学习协议。从已知膜结合蛋白的氨基酸序列和三级结构计算出其关键特征,包括静电性质和氨基酸组成,然后将这些特征纳入支持向量机进行分类。使用包含40个膜结合蛋白和230个非膜结合蛋白的数据集来构建和验证该协议。该协议的交叉验证和留出法评估表明,预测准确率分别高达93.7%和91.6%。该协议被应用于预测新型蛋白激酶C的四个C2结构域的膜结合特性。尽管这些C2结构域具有50%的序列同一性,但其中只有一个被预测能结合膜,这通过表面等离子体共振分析得到了实验验证。这些结果表明,我们的协议可用于预测多种模块化结构域的膜结合特性,并可能进一步扩展到膜结合外周蛋白的基因组规模鉴定。