Mai Te-Lun, Hu Geng-Ming, Chen Chi-Ming
Department of Physics, National Taiwan Normal University , Taipei, Taiwan.
J Proteome Res. 2016 Jul 1;15(7):2123-31. doi: 10.1021/acs.jproteome.5b01031. Epub 2016 Jun 15.
Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.
近十年的研究表明,蛋白质网络知识在推进蛋白质分子进化研究、理解细胞对扰动的稳健性以及注释新的蛋白质功能方面具有实用性。在本研究中,我们旨在提供一种通用的聚类方法,以可视化蛋白质网络的序列 - 结构 - 功能关系,并探究基于序列、结构和功能的蛋白质分类不一致的可能原因。蛋白质网络的这种可视化有助于我们理解蛋白质之间的整体关系,并帮助研究人员理解各种蛋白质数据库。作为示例,我们使用最小跨度聚类(MSC)方法按序列和结构对1437种酶进行聚类。该蛋白质网络的总体结构在两个聚类分辨率下得以描绘,并且发现二级MSC聚类与现有的酶分类高度相似。基于序列、结构和功能信息对这些酶进行的聚类相互一致。对于蛋白酶,序列与功能分类之间的杰卡德相似系数为0.86,序列与结构分类之间为0.82,结构与功能分类之间为0.78。从我们的聚类结果中,我们讨论了酶的趋异进化和趋同进化的可能示例。我们的聚类方法提供了蛋白质序列 - 结构 - 功能网络的全景视图,有助于直观地可视化相关蛋白质之间的关系,并且在预测新确定的蛋白质序列的结构和功能方面很有用。