Department of Computer Science and Engineering, University of North Texas, TX, USA.
Department of Biomedical Engineering, University of North Texas, TX, USA.
J Proteomics. 2021 Sep 15;247:104316. doi: 10.1016/j.jprot.2021.104316. Epub 2021 Jul 8.
Metaproteomics is becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. In this paper, we proposed a deep-learning-based algorithm, named DeepFilter, for improving peptide identifications from a collection of tandem mass spectra. The key advantage of the DeepFilter is that it does not need ad hoc training or fine-tuning as in existing filtering tools. DeepFilter is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DeepFilter. SIGNIFICANCE: The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra against a predefined protein sequence database and assigning top-scored peptides to spectra. Existing computational tools are still far from being able to extract all the information out of MS/MS data sets acquired from metaproteome samples. Systematical experiment results demonstrate that the DeepFilter identified up to 12% and 9% more peptide-spectrum-matches and proteins, respectively, compared with existing filtering algorithms, including Percolator, Q-ranker, PeptideProphet, and iProphet, on marine and soil microbial metaproteome samples with false discovery rate at 1%. The taxonomic analysis shows that DeepFilter found up to 7%, 10%, and 14% more species from marine, soil, and human gut samples compared with existing filtering algorithms. Therefore, DeepFilter was believed to generalize properly to new, previously unseen peptide-spectrum-matches and can be readily applied in peptide identification from metaproteomics data.
蛋白质组学在微生物组研究中得到了广泛应用,有助于深入了解微生物群落的功能状态。目前的蛋白质组学研究通常基于高通量串联质谱(MS/MS)与液相色谱的联用。在本文中,我们提出了一种基于深度学习的算法,名为 DeepFilter,用于提高串联质谱谱图中肽段鉴定的准确率。DeepFilter 的主要优势在于,它不需要像现有过滤工具那样进行专门的训练或微调。DeepFilter 可在 https://github.com/Biocomputing-Research-Group/DeepFilter 上根据 GNU GPL 许可证免费获取。
从 MS 数据中鉴定肽段和蛋白质涉及到对 MS/MS 谱图与预定义蛋白质序列数据库进行搜索,并将得分最高的肽段分配给谱图的计算过程。现有的计算工具仍然远远不能从海洋和土壤微生物蛋白质组样本中获取的 MS/MS 数据集提取所有信息。系统实验结果表明,与现有的过滤算法(包括 Percolator、Q-ranker、PeptideProphet 和 iProphet)相比,DeepFilter 在海洋、土壤和人类肠道微生物蛋白质组样本中分别鉴定出多达 12%和 9%的肽段谱图匹配和蛋白质,假阳性率为 1%。分类分析表明,与现有的过滤算法相比,DeepFilter 从海洋、土壤和人类肠道样本中分别鉴定出多达 7%、10%和 14%的更多物种。因此,DeepFilter 可以正确地推广到新的、以前未见过的肽段谱图,并且可以很容易地应用于蛋白质组学数据中的肽段鉴定。