Peeters Marlies K R, Baggerman Geert, Gabriels Ralf, Pepermans Elise, Menschaert Gerben, Boonen Kurt
BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.
Centre for Proteomics, University of Antwerp, Antwerp, Belgium.
Front Cell Dev Biol. 2021 Sep 17;9:720570. doi: 10.3389/fcell.2021.720570. eCollection 2021.
Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides , it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MSPIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.
生物活性肽在多种复杂过程中发挥关键作用,如体重调节、学习、衰老和先天免疫反应。除了通过特定蛋白水解加工从较大前体蛋白中产生的经典生物活性肽外,一类源自小开放阅读框(sORF)的新型肽已被确认为重要的生物调节剂。尽管它们在各种途径中起主要作用,但它们的内在特性、特定表达模式以及在假定非编码区域的定位阻碍了对生物活性肽库的全面表征。尽管肽组学的发展为研究这些肽提供了机会,但由于缺乏裂解酶规范和庞大的搜索空间使传统数据库搜索方法变得复杂,因此识别完整的肽组仍然具有挑战性。在本研究中,我们引入了一种蛋白质基因组学方法,使用新型质谱仪并实施机器学习工具,以改进对小鼠脑中潜在生物活性肽的识别。将阱式离子迁移谱(tims)与飞行时间质谱仪(TOF)联用,提高了灵敏度,增强了肽覆盖率,降低了化学噪声,并减少了嵌合谱的出现。随后的机器学习工具MSPIP(预测碎片离子强度)和DeepLC(预测保留时间),基于包含sORF和可变ORF的大型综合定制数据库改进了数据库搜索。最后,通过应用后处理半监督学习工具Percolator进一步增强了肽的识别。应用这个工作流程,即第一个结合光谱强度和保留时间预测的肽组学工作流程,我们总共鉴定出167种预测的sORF编码肽,其中48种源自假定的非编码位置,此外还有401种来自已知神经肽前体的肽,与22个不同家族中的66种注释生物活性神经肽相关。额外的PEAKS分析将假定非编码位置的SEP库扩展到84种,同时又有204种肽完成了神经肽前体肽的列表。总之,本研究深入了解了一种新的强大流程,该流程融合了不同领域的技术进步,确保了对小鼠脑中神经肽组的更好覆盖。