Suppr超能文献

鸟枪法蛋白质组学中肽段色谱保留时间预测的不确定性估计

Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics.

作者信息

Maboudi Afkham Heydar, Qiu Xuanbin, The Matthew, Käll Lukas

机构信息

Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, 17121 Solna, Sweden.

出版信息

Bioinformatics. 2017 Feb 15;33(4):508-513. doi: 10.1093/bioinformatics/btw619.

Abstract

MOTIVATION

Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time . Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor E lude . Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction.

RESULTS

In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

CONTACT

lukas.kall@scilifelab.se.

AVAILABILITY AND IMPLEMENTATION

Our software and the data used in our experiments is publicly available and can be downloaded from https://github.com/statisticalbiotechnology/GPTime .

摘要

动机

在鸟枪法蛋白质组学中,液相色谱法经常被用作降低肽混合物复杂性的一种手段。对于此类系统,肽从色谱柱中洗脱并在质谱仪中记录的时间被称为肽的保留时间。先前的研究使用启发式方法或机器学习技术证明,从肽的氨基酸序列预测其保留时间是可能的。在本文中,我们将高斯过程回归应用于先前描述的预测器Elude的特征表示。使用这个框架,我们证明了可以估计模型预测的不确定性。在此,我们展示了这种不确定性与预测的实际误差之间的关系。

结果

在我们的实验中,我们观察到高斯过程回归提供的估计不确定性与实际预测误差之间存在很强的相关性。这种关系为我们提供了评估预测的新方法。我们展示了如何选择一个肽的子集,使其预测误差比整个集合更低。我们还展示了如何将这种预测的标准差用于设计自适应窗口策略。

联系方式

lukas.kall@scilifelab.se

可用性和实现方式

我们的软件以及实验中使用的数据是公开可用的,可以从https://github.com/statisticalbiotechnology/GPTime下载。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验