Suppr超能文献

一种用于从头肽测序的更好评分模型:解释质量与测量质量之间的对称差异。

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.

作者信息

Tschager Thomas, Rösch Simon, Gillet Ludovic, Widmayer Peter

机构信息

Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zurich, Switzerland.

Department of Biology, ETH Zurich, Auguste-Piccard-Hof 1, 8093 Zurich, Switzerland.

出版信息

Algorithms Mol Biol. 2017 May 11;12:12. doi: 10.1186/s13015-017-0104-1. eCollection 2017.

Abstract

BACKGROUND

Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the : Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible.

RESULTS

Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible.

CONCLUSIONS

We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016.

摘要

背景

给定一个由氨基酸组成的肽段字符串,通过对氨基酸质量进行简单的线性扫描,可以找到其所有前缀和后缀的质量。反问题是:给定所有前缀和后缀的质量,确定氨基酸字符串。在生物学实际情况中,给定的质量是在实验室实验中测量得到的,而且测量必然存在噪声。因此,(真实的、有噪声的)输入存在噪声:肽段的一些前缀和后缀质量缺失,另外还给出了一些其他质量。对于这种情况,我们需要一个氨基酸字符串,使其尽可能准确地解释给定的质量。

结果

过去的方法通过寻找一个能解释尽可能多质量的字符串来解释准确性。然而,我们认为,不仅不解释出现的质量是不好的,而且解释未出现的质量也是不好的。我们建议最小化给定质量集合与字符串所解释的质量集合之间的对称差。对于这个新的优化问题,我们提出了一种高效算法,该算法能计算出最优解和次优解。对合成肽段测量的概念验证实验表明,与寻找一个能解释尽可能多给定质量的字符串相比,我们的方法能得到更好的结果。

结论

我们得出结论,将对称差作为优化目标可以提高从头肽段测序的识别率。这项工作的初步版本已在2016年的生物信息学算法国际研讨会(WABI)上发表。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1d5/5464308/634bafff3090/13015_2017_104_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验