European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom.
Open Targets, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom.
J Proteome Res. 2023 Mar 3;22(3):729-742. doi: 10.1021/acs.jproteome.2c00406. Epub 2022 Dec 28.
The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.
近年来,公共领域,特别是 PRIDE 数据库中蛋白质组学数据集的可用性呈指数级增长。这些前所未有的大规模数据可用性为联合分析数据集提供了机会,从而以一致的方式获取全生物体蛋白质丰度数据。我们重新分析了 24 个来自健康人类个体的公共蛋白质组学数据集,以评估 31 个器官的基线蛋白质丰度。我们将组织定义为器官内具有独特功能或结构的区域。总的来说,聚合数据集包含 67 个健康组织,对应于 3119 次质谱运行,涵盖了 489 名个体的 498 个样本。我们比较了不同器官之间的蛋白质丰度,并研究了蛋白质在这些器官中的分布。我们还将结果与类似研究生成的数据进行了比较。此外,我们还进行了基因本体论和途径富集分析,以确定特定器官的富含生物学过程和途径。作为一个关键点,我们将蛋白质丰度结果集成到资源表达图谱中,在那里可以单独或与来自转录组学数据集的基因表达数据一起访问和可视化这些结果。我们相信这是使蛋白质组学数据更容易被生命科学家访问的一种好方法。