National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
Genomics Proteomics Bioinformatics. 2021 Oct;19(5):772-786. doi: 10.1016/j.gpb.2021.02.002. Epub 2021 Feb 23.
A lack of the complete pig proteome has left a gap in our knowledge of the pig genome and has restricted the feasibility of using pigs as a biomedical model. In this study, we developed a tissue-based proteome map using 34 major normal pig tissues. A total of 5841 unknown protein isoforms were identified and systematically characterized, including 2225 novel protein isoforms, 669 protein isoforms from 460 genes symbolized beginning with LOC, and 2947 protein isoforms without clear NCBI annotation in the current pig reference genome. These newly identified protein isoforms were functionally annotated through profiling the pig transcriptome with high-throughput RNA sequencing of the same pig tissues, further improving the genome annotation of the corresponding protein-coding genes. Combining the well-annotated genes that have parallel expression pattern and subcellular witness, we predicted the tissue-related subcellularlocations and potential functions for these unknown proteins. Finally, we mined 3081 orthologous genes for 52.7% of unknown protein isoforms across multiple species, referring to 68 KEGG pathways as well as 23 disease signaling pathways. These findings provide valuable insights and a rich resource for enhancing studies of pig genomics and biology, as well as biomedical model application to human medicine.
猪蛋白质组的不完整导致我们对猪基因组的认识存在空白,并限制了将猪作为生物医学模型的可行性。在这项研究中,我们使用 34 种主要的正常猪组织开发了基于组织的蛋白质组图谱。共鉴定出 5841 种未知的蛋白质同工型,并对其进行了系统的特征描述,包括 2225 种新的蛋白质同工型、来自 460 个基因的 669 种蛋白质同工型,这些基因的符号以 LOC 开头,以及 2947 种在当前猪参考基因组中没有明确 NCBI 注释的蛋白质同工型。通过对同一猪组织进行高通量 RNA 测序,对这些新鉴定的蛋白质同工型进行了功能注释,进一步完善了相应蛋白质编码基因的基因组注释。结合具有平行表达模式和亚细胞见证的充分注释基因,我们预测了这些未知蛋白质的组织相关亚细胞定位和潜在功能。最后,我们在多个物种中挖掘了 3081 个与 52.7%的未知蛋白质同工型同源的基因,涉及 68 个 KEGG 途径和 23 个疾病信号通路。这些发现为增强猪基因组学和生物学的研究以及将生物医学模型应用于人类医学提供了有价值的见解和丰富的资源。