Possenti Andrea, Vendruscolo Michele, Camilloni Carlo, Tiana Guido
Center for Complexity and Biosystems and Department of Physics, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy.
Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom.
Proteins. 2018 Sep;86(9):956-964. doi: 10.1002/prot.25527.
Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences.
蛋白质利用存储在遗传密码中并转化为其序列的信息,在细胞环境中执行明确的功能。编码此类功能的可能性由序列提供的信息量与蛋白质折叠成其结构后剩余的信息量之间的平衡控制。我们研究了指定蛋白质结构所需的信息量,并给出了一个考虑蛋白质折叠热力学性质的估计值。因此,我们表明,在编码蛋白质结构后留在蛋白质序列中的信息(“信息差距”)与编码其功能和相互作用所需的信息非常接近。然后,通过直接从蛋白质序列预测信息差距,我们表明,利用信息论的这些见解来区分有序和无序蛋白质、识别未知功能以及优化人工设计的蛋白质序列可能是可行的。