Department of Biochemistry and Structural Biology, Instituto de Fisiologa Celular, Universidad Nacional Autónoma de México, México D.F. 04510, Mexico.
Computer Science Department, CICESE Research Center, Ensenada, Baja California 22860, Mexico.
Molecules. 2017 Oct 9;22(10):1673. doi: 10.3390/molecules22101673.
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis of the structure-function relationship of proteins: (i) the lack of a formal definition of what critical residues are and (ii) the lack of a systematic evaluation of methods and protein structure features. To address this problem, here we introduce an index to quantify the protein-function criticality of a residue based on experimental data and a strategy aimed to optimize both, descriptors of protein structure (physicochemical and centrality descriptors) and machine learning algorithms, to minimize the error in the classification of critical residues. We observed that both physicochemical and centrality descriptors of residues effectively relate protein structure and protein function, and that physicochemical descriptors better describe critical residues. We also show that critical residues are better classified when residue criticality is considered as a binary attribute (i.e., residues are considered critical or not critical). Using this binary annotation for critical residues 8 models rendered accurate and non-overlapping classification of critical residues, confirming the multi-factorial character of the structure-function relationship of proteins.
蛋白质结构和蛋白质功能应该是相关的,但这种关系的本质仍未解决。通过将蛋白质结构特征与蛋白质功能的关键残基进行映射,为探索这种关系提供了机会,但有两个重要的限制因素妨碍了对蛋白质结构-功能关系的适当分析:(i)缺乏对关键残基的正式定义,以及(ii)缺乏对方法和蛋白质结构特征的系统评估。为了解决这个问题,我们在这里引入了一个基于实验数据的指标来量化残基的蛋白质功能关键程度,以及一个旨在优化描述蛋白质结构的描述符(理化和中心性描述符)和机器学习算法的策略,以最小化关键残基分类中的错误。我们观察到,残基的理化和中心性描述符都能有效地将蛋白质结构和蛋白质功能联系起来,而且理化描述符能更好地描述关键残基。我们还表明,当将残基关键程度视为二分类属性(即,将残基视为关键或非关键)时,关键残基的分类效果更好。使用这种关键残基的二进制注释,8 个模型对关键残基进行了准确且不重叠的分类,证实了蛋白质结构-功能关系的多因素特征。