Wen Bo, Hsu Chris, Zeng Wen-Feng, Riffle Michael, Chang Alexis, Mudge Miranda, Nunn Brook, Berg Matthew D, Villén Judit, MacCoss Michael J, Noble William S
Department of Genome Sciences, University of Washington.
Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Germany.
bioRxiv. 2024 Oct 18:2024.10.15.618504. doi: 10.1101/2024.10.15.618504.
Data-independent acquisition (DIA)-based mass spectrometry is becoming an increasingly popular mass spectrometry acquisition strategy for carrying out quantitative proteomics experiments. Most of the popular DIA search engines make use of generated spectral libraries. However, the generation of high-quality spectral libraries for DIA data analysis remains a challenge, particularly because most such libraries are generated directly from data-dependent acquisition (DDA) data or are from prediction using models trained on DDA data. In this study, we developed Carafe, a tool that generates high-quality experiment-specific spectral libraries by training deep learning models directly on DIA data. We demonstrate the performance of Carafe on a wide range of DIA datasets, where we observe improved fragment ion intensity prediction and peptide detection relative to existing pretrained DDA models.
基于数据非依赖采集(DIA)的质谱分析正日益成为开展定量蛋白质组学实验的一种流行的质谱采集策略。大多数流行的DIA搜索引擎都利用生成的谱图库。然而,为DIA数据分析生成高质量的谱图库仍然是一项挑战,特别是因为大多数此类库是直接从数据依赖采集(DDA)数据生成的,或者是基于使用在DDA数据上训练的模型进行预测得到的。在本研究中,我们开发了Carafe,这是一种通过直接在DIA数据上训练深度学习模型来生成高质量的特定实验谱图库的工具。我们在广泛的DIA数据集上展示了Carafe的性能,在这些数据集中,我们观察到相对于现有的预训练DDA模型,其碎片离子强度预测和肽段检测得到了改进。