Chiu Yulun, Schliekelman Paul, Orlando Ron, Sharp Joshua S
From the ‡Complex Carbohydrate Research Center.
§Institute of Bioinformatics.
Mol Cell Proteomics. 2017 Feb;16(2):255-264. doi: 10.1074/mcp.M116.062588. Epub 2016 Dec 9.
We present a statistical model to estimate the accuracy of derivatized heparin and heparan sulfate (HS) glycosaminoglycan (GAG) assignments to tandem mass (MS/MS) spectra made by the first published database search application, GAG-ID. Employing a multivariate expectation-maximization algorithm, this statistical model distinguishes correct from ambiguous and incorrect database search results when computing the probability that heparin/HS GAG assignments to spectra are correct based upon database search scores. Using GAG-ID search results for spectra generated from a defined mixture of 21 synthesized tetrasaccharide sequences as well as seven spectra of longer defined oligosaccharides, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly, ambiguously, and incorrectly assigned heparin/HS GAGs. This analysis makes it possible to filter large MS/MS database search results with predictable false identification error rates.
我们提出了一种统计模型,用于估计首个已发表的数据库搜索应用程序GAG-ID对串联质谱(MS/MS)谱图进行衍生化肝素和硫酸乙酰肝素(HS)糖胺聚糖(GAG)归属的准确性。该统计模型采用多元期望最大化算法,在根据数据库搜索分数计算肝素/HS GAG对谱图的归属正确的概率时,能够区分正确、模糊和错误的数据库搜索结果。利用GAG-ID对由21种合成四糖序列的定义混合物产生的谱图以及7种更长的定义寡糖的谱图的搜索结果,我们证明计算出的概率是准确的,并且具有很高的能力来区分正确、模糊和错误归属的肝素/HS GAG。这种分析使得可以用可预测的错误识别率过滤大型MS/MS数据库搜索结果。