Mouratidis Ioannis, Chan Candace S Y, Chantzi Nikol, Tsiatsianis Georgios Christos, Hemberg Martin, Ahituv Nadav, Georgakopoulos-Soares Ilias
Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA, USA.
Department of Engineering Science, KU Leuven, Leuven, Belgium.
NAR Genom Bioinform. 2023 Apr 24;5(2):lqad039. doi: 10.1093/nargab/lqad039. eCollection 2023 Jun.
Determining the organisms present in a biosample has many important applications in agriculture, wildlife conservation, and healthcare. Here, we develop a universal fingerprint based on the identification of short peptides that are unique to a specific organism. We define quasi-prime peptides as sequences that are found in only one species, and we analyzed proteomes from 21 875 species, from viruses to humans, and annotated the smallest peptide kmer sequences that are unique to a species and absent from all other proteomes. We also perform simulations across all reference proteomes and observe a lower than expected number of peptide kmers across species and taxonomies, indicating an enrichment for nullpeptides, sequences absent from a proteome. For humans, we find that quasi-primes are found in genes enriched for specific gene ontology terms, including proteasome and ATP and GTP catalysis. We also provide a set of quasi-prime peptides for a number of human pathogens and model organisms and further showcase its utility via two case studies for and , where we identify quasi-prime peptides in two transmembrane and extracellular proteins with relevance for pathogen detection. Our catalog of quasi-prime peptides provides the smallest unit of information that is specific to a single organism at the protein level, providing a versatile tool for species identification.
确定生物样本中存在的生物体在农业、野生动物保护和医疗保健等领域有许多重要应用。在此,我们基于对特定生物体特有的短肽的鉴定开发了一种通用指纹图谱。我们将准原肽定义为仅在一个物种中发现的序列,并分析了从病毒到人类的21875个物种的蛋白质组,注释了物种特有的且在所有其他蛋白质组中不存在的最小肽段序列。我们还对所有参考蛋白质组进行了模拟,观察到跨物种和分类学的肽段序列数量低于预期,这表明存在空肽富集现象,即蛋白质组中不存在的序列。对于人类,我们发现准原肽存在于富含特定基因本体术语的基因中,包括蛋白酶体以及ATP和GTP催化相关基因。我们还为多种人类病原体和模式生物提供了一组准原肽,并通过针对[具体生物1]和[具体生物2]的两个案例研究进一步展示了其效用,在这两个案例中,我们在与病原体检测相关的两种跨膜和细胞外蛋白质中鉴定出了准原肽。我们的准原肽目录提供了蛋白质水平上特定于单个生物体的最小信息单位,为物种鉴定提供了一种通用工具。