Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan, People's Republic of China.
PLoS One. 2012;7(2):e31791. doi: 10.1371/journal.pone.0031791. Epub 2012 Feb 16.
Circular permutation (CP) refers to situations in which the termini of a protein are relocated to other positions in the structure. CP occurs naturally and has been artificially created to study protein function, stability and folding. Recently CP is increasingly applied to engineer enzyme structure and function, and to create bifunctional fusion proteins unachievable by tandem fusion. CP is a complicated and expensive technique. An intrinsic difficulty in its application lies in the fact that not every position in a protein is amenable for creating a viable permutant. To examine the preferences of CP and develop CP viability prediction methods, we carried out comprehensive analyses of the sequence, structural, and dynamical properties of known CP sites using a variety of statistics and simulation methods, such as the bootstrap aggregating, permutation test and molecular dynamics simulations. CP particularly favors Gly, Pro, Asp and Asn. Positions preferred by CP lie within coils, loops, turns, and at residues that are exposed to solvent, weakly hydrogen-bonded, environmentally unpacked, or flexible. Disfavored positions include Cys, bulky hydrophobic residues, and residues located within helices or near the protein's core. These results fostered the development of an effective viable CP site prediction system, which combined four machine learning methods, e.g., artificial neural networks, the support vector machine, a random forest, and a hierarchical feature integration procedure developed in this work. As assessed by using the hydrofolate reductase dataset as the independent evaluation dataset, this prediction system achieved an AUC of 0.9. Large-scale predictions have been performed for nine thousand representative protein structures; several new potential applications of CP were thus identified. Many unreported preferences of CP are revealed in this study. The developed system is the best CP viability prediction method currently available. This work will facilitate the application of CP in research and biotechnology.
环状排列(CP)是指蛋白质的末端重新定位到结构中的其他位置的情况。CP 自然发生,并已被人为创造出来研究蛋白质的功能、稳定性和折叠。最近,CP 越来越多地被应用于工程酶结构和功能,并创造串联融合无法实现的双功能融合蛋白。CP 是一种复杂且昂贵的技术。其应用的内在困难在于,蛋白质中的每个位置并不都适合创建可行的排列。为了研究 CP 的偏好并开发 CP 可行性预测方法,我们使用各种统计学和模拟方法,如自举聚合、排列检验和分子动力学模拟,对已知 CP 位点的序列、结构和动力学特性进行了全面分析。CP 特别偏爱甘氨酸、脯氨酸、天冬氨酸和天冬酰胺。CP 偏好的位置位于螺旋、环、转角和暴露在溶剂中的残基、弱氢键、环境未包装或柔性的残基。不受欢迎的位置包括半胱氨酸、大体积疏水性残基和位于螺旋内或靠近蛋白质核心的残基。这些结果促进了一种有效的可行 CP 位点预测系统的发展,该系统结合了四种机器学习方法,如人工神经网络、支持向量机、随机森林和本工作中开发的分层特征集成过程。通过使用水叶酸还原酶数据集作为独立评估数据集进行评估,该预测系统的 AUC 为 0.9。已经对 9000 个代表性蛋白质结构进行了大规模预测;因此确定了 CP 的几个新的潜在应用。本研究揭示了许多未报道的 CP 偏好。开发的系统是目前可用的最佳 CP 可行性预测方法。这项工作将促进 CP 在研究和生物技术中的应用。