Guruceaga Elizabeth, Sanchez del Pino Manuel M, Corrales Fernando J, Segura Victor
Proteomics, Genomics and Bioinformatics Unit, §Division of Hepatology and Gene Therapy, Center for Applied Medical Research, University of Navarra , Pamplona 31008, Spain.
J Proteome Res. 2015 Mar 6;14(3):1350-60. doi: 10.1021/pr500850u. Epub 2015 Feb 5.
Experimental evidence for the entire human proteome has been defined in the Human Proteome Project, and it is publicly available in the neXtProt database. However, there are still human proteins for which reliable experimental evidence does not exist, and the identification of such information has become one of the overriding objectives in the chromosome-centric study of the human proteome. With this aim and considering the complexity of protein detection using shotgun and targeted proteomics, the research community has addressed the integration of transcriptomics and proteomics landscapes. Here, we describe an analytical pipeline that predicts the probability of a missing protein being expressed in a biological sample based on (1) gene sequence characteristics, (2) the probability of an expressed gene being a coding gene of a missing protein in a certain sample, and (3) the probability of a gene being expressed in a transcriptomic experiment. More than 3400 microarray experiments were analyzed corresponding to three biological sources: cell lines, normal tissues, and cancer samples. A gene classification based on gene expression profiles distinguished among ubiquitous, nonubiquitous, nonexpressed, and coding genes of missing proteins. In addition, a different tissue-specific expression pattern for the coding genes of missing proteins is reported. Our results underline the relevance of selecting an appropriate sample for the detection of missing proteins and provide a comprehensive method to score their expression probability. Testis, brain, and skeletal muscle are the most promising normal tissues.
人类蛋白质组计划已确定了整个人类蛋白质组的实验证据,这些证据可在neXtProt数据库中公开获取。然而,仍有一些人类蛋白质缺乏可靠的实验证据,识别此类信息已成为以染色体为中心的人类蛋白质组研究的首要目标之一。出于这一目的,并考虑到使用鸟枪法和靶向蛋白质组学进行蛋白质检测的复杂性,研究界已着手整合转录组学和蛋白质组学的全景图。在此,我们描述了一种分析流程,该流程基于以下几点预测生物样品中缺失蛋白质表达的可能性:(1)基因序列特征;(2)在特定样品中,一个表达基因作为缺失蛋白质编码基因的可能性;(3)一个基因在转录组实验中表达的可能性。我们分析了对应于三种生物来源(细胞系、正常组织和癌症样本)的3400多个微阵列实验。基于基因表达谱的基因分类区分了缺失蛋白质的普遍存在、非普遍存在、不表达和编码基因。此外,还报道了缺失蛋白质编码基因不同的组织特异性表达模式。我们的结果强调了选择合适的样品来检测缺失蛋白质的重要性,并提供了一种综合方法来评估它们的表达可能性。睾丸、大脑和骨骼肌是最有前景的正常组织。