Maboudi Afkham Heydar, Qiu Xuanbin, The Matthew, Käll Lukas
Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, 17121 Solna, Sweden.
Bioinformatics. 2017 Feb 15;33(4):508-513. doi: 10.1093/bioinformatics/btw619.
Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time . Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor E lude . Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction.
In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.
Our software and the data used in our experiments is publicly available and can be downloaded from https://github.com/statisticalbiotechnology/GPTime .
在鸟枪法蛋白质组学中,液相色谱法经常被用作降低肽混合物复杂性的一种手段。对于此类系统,肽从色谱柱中洗脱并在质谱仪中记录的时间被称为肽的保留时间。先前的研究使用启发式方法或机器学习技术证明,从肽的氨基酸序列预测其保留时间是可能的。在本文中,我们将高斯过程回归应用于先前描述的预测器Elude的特征表示。使用这个框架,我们证明了可以估计模型预测的不确定性。在此,我们展示了这种不确定性与预测的实际误差之间的关系。
在我们的实验中,我们观察到高斯过程回归提供的估计不确定性与实际预测误差之间存在很强的相关性。这种关系为我们提供了评估预测的新方法。我们展示了如何选择一个肽的子集,使其预测误差比整个集合更低。我们还展示了如何将这种预测的标准差用于设计自适应窗口策略。
我们的软件以及实验中使用的数据是公开可用的,可以从https://github.com/statisticalbiotechnology/GPTime下载。