Andreini Claudia, Banci Lucia, Bertini Ivano, Rosato Antonio
Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy.
J Proteome Res. 2006 Jan;5(1):196-201. doi: 10.1021/pr050361j.
Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, or for regulation of their activities or for structural purposes. Genome sequencing projects have provided a huge number of protein primary sequences, but, even though several different elaborate analyses and annotations have been enabled by a rich and ever-increasing portfolio of bioinformatic tools, metal-binding properties remain difficult to predict as well as to investigate experimentally. Consequently, the present knowledge about metalloproteins is only partial. The present bioinformatic research proposes a strategy to answer the question of how many and which proteins encoded in the human genome may require zinc for their physiological function. This is achieved by a combination of approaches, which include: (i) searching in the proteome for the zinc-binding patterns that, on their turn, are obtained from all available X-ray data; (ii) using libraries of metal-binding protein domains based on multiple sequence alignments of known metalloproteins obtained from the Pfam database; and (iii) mining the annotations of human gene sequences, which are based on any type of information available. It is found that 1684 proteins in the human proteome are independently identified by all three approaches as zinc-proteins, 746 are identified by two, and 777 are identified by only one method. By assuming that all proteins identified by at least two approaches are truly zinc-binding and inspecting the proteins identified by a single method, it can be proposed that ca. 2800 human proteins are potentially zinc-binding in vivo, corresponding to 10% of the human proteome, with an uncertainty of 400 sequences. Available functional information suggests that the large majority of human zinc-binding proteins are involved in the regulation of gene expression. The most abundant class of zinc-binding proteins in humans is that of zinc-fingers, with Cys4 and Cys2His2 being the most common types of coordination environment.
金属蛋白是能够结合一个或多个金属离子的蛋白质,这些金属离子可能是其生物学功能、活性调节或结构目的所必需的。基因组测序项目提供了大量的蛋白质一级序列,然而,尽管丰富且不断增加的生物信息学工具组合使得多种不同的精细分析和注释成为可能,但金属结合特性仍然难以预测,也难以通过实验进行研究。因此,目前关于金属蛋白的知识只是部分的。当前的生物信息学研究提出了一种策略,以回答人类基因组中编码的多少种以及哪些蛋白质可能需要锌来实现其生理功能这一问题。这是通过多种方法的组合来实现的,这些方法包括:(i)在蛋白质组中搜索锌结合模式,而这些模式又是从所有可用的X射线数据中获得的;(ii)使用基于从Pfam数据库获得的已知金属蛋白的多序列比对的金属结合蛋白结构域文库;以及(iii)挖掘基于任何可用类型信息的人类基因序列注释。研究发现,人类蛋白质组中有1684种蛋白质通过所有三种方法都被独立鉴定为锌蛋白,746种通过两种方法被鉴定,777种仅通过一种方法被鉴定。通过假设所有通过至少两种方法鉴定的蛋白质都是真正的锌结合蛋白,并检查仅通过一种方法鉴定的蛋白质,可以提出大约2800种人类蛋白质在体内可能具有锌结合能力,相当于人类蛋白质组的10%,序列不确定性为400个。现有的功能信息表明,绝大多数人类锌结合蛋白参与基因表达的调节。人类中最丰富的锌结合蛋白类别是锌指蛋白,其中Cys4和Cys2His2是最常见的配位环境类型。