Campen Andrew, Williams Ryan M, Brown Celeste J, Meng Jingwei, Uversky Vladimir N, Dunker A Keith
Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, IN 46202, USA.
Protein Pept Lett. 2008;15(9):956-63. doi: 10.2174/092986608785849164.
Intrinsically disordered proteins carry out various biological functions while lacking ordered secondary and/or tertiary structure. In order to find general intrinsic properties of amino acid residues that are responsible for the absence of ordered structure in intrinsically disordered proteins we surveyed 517 amino acid scales. Each of these scales was taken as an independent attribute for the subsequent analysis. For a given attribute value X, which is averaged over a consecutive string of amino acids, and for a given data set having both ordered and disordered segments, the conditional probabilities P(s(o) | x) and P(s(d) | x) for order and disorder, respectively, can be determined for all possible values of X. Plots of the conditional probabilities P(s(o) | x) and P(s(o) | x) versus X give a pair of curves. The area between these two curves divided by the total area of the graph gives the area ratio value (ARV), which is proportional to the degree of separation of the two probability curves and, therefore, provides a measure of the given attribute's power to discriminate between order and disorder. As ARV falls between zero and one, larger ARV corresponds to the better discrimination between order and disorder. Starting from the scale with the highest ARV, we applied a simulated annealing procedure to search for alternative scale values and have managed to increase the ARV by more than 10%. The ranking of the amino acids in this new TOP-IDP scale is as follows (from order promoting to disorder promoting): W, F, Y, I, M, L, V, N, C, T, A, G, R, D, H, Q, K, S, E, P. A web-based server has been created to apply the TOP-IDP scale to predict intrinsically disordered proteins (http://www.disprot.org/dev/disindex.php).
内在无序蛋白质在缺乏有序二级和/或三级结构的情况下执行各种生物学功能。为了找到导致内在无序蛋白质缺乏有序结构的氨基酸残基的一般内在特性,我们调查了517种氨基酸标度。这些标度中的每一个都被用作后续分析的独立属性。对于在连续一串氨基酸上平均得到的给定属性值X,以及对于具有有序和无序片段的给定数据集,可以针对X的所有可能值分别确定有序和无序的条件概率P(s(o) | x)和P(s(d) | x)。条件概率P(s(o) | x)和P(s(o) | x)与X的关系图给出一对曲线。这两条曲线之间的面积除以图形的总面积得到面积比值(ARV),它与两条概率曲线的分离程度成正比,因此提供了给定属性区分有序和无序的能力的一种度量。由于ARV介于0和1之间,ARV越大,有序和无序之间的区分就越好。从具有最高ARV的标度开始,我们应用模拟退火程序来搜索替代标度值,并成功将ARV提高了10%以上。这个新的TOP-IDP标度中氨基酸的排名如下(从促进有序到促进无序):W、F、Y、I、M、L、V、N、C、T、A、G、R、D、H、Q、K、S、E、P。已经创建了一个基于网络的服务器来应用TOP-IDP标度预测内在无序蛋白质(网址:http://www.disprot.org/dev/disindex.php)。