González Javier M
Instituto de Bionanotecnología del NOA (INBIONATEC), Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional de Santiago del Estero (CONICET-UNSE), G4206XCP Santiago del Estero, Argentina.
Heliyon. 2021 Jan 2;7(1):e05867. doi: 10.1016/j.heliyon.2020.e05867. eCollection 2021 Jan.
Protein sequence similarity networks (SSNs) constitute a convenient approach to analyze large polypeptide sequence datasets, and have been successfully applied to study a number of protein families over the past decade. SSN analysis is herein combined with traditional cladistic and phenetic phylogenetic analysis (respectively based on multiple sequence alignments and all-against-all three-dimensional protein structure comparisons) in order to assist the ancestral reconstruction and integrative revision of the superfamily of metallo-β-lactamases (MBLs). It is shown that only 198 out of 15,292 representative nodes contain at least one experimentally obtained protein structure in the Protein Data Bank or a manually annotated SwissProt entry, that is to say, only 1.3 % of the superfamily has been functionally and/or structurally characterized. Besides, neighborhood connectivity coloring, which measures local network interconnectivity, is introduced for detection of protein families within SSN clusters. This approach provides a clear picture of how many families remain unexplored in the superfamily, while most MBL research is heavily biased towards a few families. Further research is suggested in order to determine the SSN topological properties, which will be instrumental for the improvement of automated sequence annotation methods.
蛋白质序列相似性网络(SSNs)是分析大型多肽序列数据集的一种便捷方法,在过去十年中已成功应用于多个蛋白质家族的研究。本文将SSN分析与传统的支序系统发育分析和表型系统发育分析(分别基于多序列比对和所有蛋白质三维结构的两两比对)相结合,以辅助金属β-内酰胺酶(MBLs)超家族的祖先重建和综合修订。结果表明,在15292个代表性节点中,只有198个在蛋白质数据库(Protein Data Bank)中含有至少一个实验获得的蛋白质结构或一个经过人工注释的SwissProt条目,也就是说,该超家族中只有1.3%的成员在功能和/或结构上得到了表征。此外,为了检测SSN簇内的蛋白质家族,引入了邻域连通性着色法,该方法用于衡量局部网络的互连性。这种方法清楚地显示了该超家族中有多少家族尚未被探索,而大多数MBL研究都严重偏向于少数几个家族。建议进一步研究以确定SSN的拓扑特性,这将有助于改进自动化序列注释方法。