Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea.
Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon, Republic of Korea.
J Proteome Res. 2017 Dec 1;16(12):4425-4434. doi: 10.1021/acs.jproteome.7b00223. Epub 2017 Oct 13.
Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5'-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.
人类蛋白质组计划旨在绘制所有人类蛋白质图谱,包括缺失蛋白质以及具有翻译后修饰、选择性剪接变体 (ASV)、单个氨基酸变异 (SAAV) 的蛋白质。neXtProt 和 Ensemble 数据库通常用于提供有关人类编码基因的经过精心整理的信息。然而,为了找到这些蛋白质,我们(Chr #11 团队)首先引入了一个简化的管道,使用定制的和串联的 neXtProt 和源自 Ensemble 的 GENCODE,具有受控的错误发现率 (FDR)。由于该管道中使用了大型数据库,我们发现更严格的 FDR 过滤(肽水平为 0.1%,蛋白质水平为 1%)来声称从人类海马体数据集 (MSV000081385) 和 ProteomeXchange (PXD007166) 中发现新的发现,例如 GENCODE ASV 和缺失蛋白质。使用我们带有 neXtProt 和 GENCODE 数据库的下一代蛋白质组学管道 (nextPP),从人类脑组织中鉴定出两个缺失蛋白质,如活性调节细胞骨架相关蛋白 (ARC,Chr 8) 和谷氨酸受体离子型, kainite 5 (GRIK5,Chr 19),每个蛋白质都有两个或更多独特的肽。此外,通过将该管道应用于人类大脑相关数据集,如皮质 (PXD000067 和 PXD000561)、脊髓和胎儿大脑 (PXD000561),从两个或更多数据集中鉴定出七个 GENCODE ASV,如 ACTN4-012 (Chr.19)、DPYSL2-005 (Chr.8)、MPRIP-003 (Chr.17)、NCAM1-013 (Chr.11)、EPB41L1-017 (Chr.20)、AGAP1-004 (Chr.2) 和 CPNE5-005 (Chr.6)。GENCODE ASV 的鉴定肽映射到新的外显子插入、5'-非翻译区的替代翻译或新的蛋白质编码序列。将该管道应用于男性生殖器官相关数据集,从两个睾丸 (PXD000561 和 PXD002179) 和一个精子 (PXD003947) 数据集中鉴定出 52 个 GENCODE ASV。在这三个样本中,有 4 个 GENCODE ASV,如 RAB11FIP5-008 (Chr. 2)、RP13-347D8.7-001 (Chr. X)、PRDX4-002 (Chr. X) 和 RP11-666A8.13-001 (Chr. 17),在三个样本中均被鉴定出来。