El Kennani Sara, Adrait Annie, Shaytan Alexey K, Khochbin Saadi, Bruley Christophe, Panchenko Anna R, Landsman David, Pflieger Delphine, Govin Jérôme
INSERM, U1038, CEA, BIG FR CNRS 3425-BGE, Université Grenoble Alpes, Grenoble, France.
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA.
Epigenetics Chromatin. 2017 Jan 10;10:2. doi: 10.1186/s13072-016-0109-x. eCollection 2017.
Histones and histone variants are essential components of the nuclear chromatin. While mass spectrometry has opened a large window to their characterization and functional studies, their identification from proteomic data remains challenging. Indeed, the current interpretation of mass spectrometry data relies on public databases which are either not exhaustive (Swiss-Prot) or contain many redundant entries (UniProtKB or NCBI). Currently, no protein database is ideally suited for the analysis of histones and the complex array of mammalian histone variants.
We propose two proteomics-oriented manually curated databases for mouse and human histone variants. We manually curated >1700 gene, transcript and protein entries to produce a non-redundant list of 83 mouse and 85 human histones. These entries were annotated in accordance with the current nomenclature and unified with the "HistoneDB2.0 with Variants" database. This resource is provided in a format that can be directly read by programs used for mass spectrometry data interpretation. In addition, it was used to interpret mass spectrometry data acquired on histones extracted from mouse testis. Several histone variants, which had so far only been inferred by homology or detected at the RNA level, were detected by mass spectrometry, confirming the existence of their protein form.
Mouse and human histone entries were collected from different databases and subsequently curated to produce a non-redundant protein-centric resource, MS_HistoneDB. It is dedicated to the proteomic study of histones in mouse and human and will hopefully facilitate the identification and functional study of histone variants.
组蛋白和组蛋白变体是核染色质的重要组成部分。虽然质谱技术为它们的表征和功能研究打开了一扇大窗口,但从蛋白质组学数据中鉴定它们仍然具有挑战性。事实上,目前对质谱数据的解释依赖于公共数据库,这些数据库要么不够详尽(瑞士蛋白质数据库),要么包含许多冗余条目(通用蛋白质数据库或美国国立生物技术信息中心)。目前,没有一个蛋白质数据库非常适合分析组蛋白和复杂的哺乳动物组蛋白变体阵列。
我们提出了两个面向蛋白质组学的、经过人工整理的数据库,分别用于小鼠和人类组蛋白变体。我们人工整理了超过1700个基因、转录本和蛋白质条目,以生成一份包含83种小鼠组蛋白和85种人类组蛋白的非冗余列表。这些条目根据当前的命名法进行注释,并与“带变体的组蛋白数据库2.0”数据库统一。该资源以一种可被用于质谱数据解释的程序直接读取的格式提供。此外,它还被用于解释从小鼠睾丸中提取的组蛋白上获取的质谱数据。通过质谱检测到了几种迄今为止仅通过同源性推断或在RNA水平检测到的组蛋白变体,证实了它们蛋白质形式的存在。
从小鼠和人类的不同数据库中收集了组蛋白条目,随后进行整理,以生成一个以蛋白质为中心的非冗余资源——MS_HistoneDB。它致力于小鼠和人类组蛋白的蛋白质组学研究,有望促进组蛋白变体的鉴定和功能研究。