Faculty for Mathematics and Computer Science, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, Jena 07743, Germany.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):976-86. doi: 10.1109/TCBB.2010.129.
Glycans are molecules made from simple sugars that form complex tree structures. Glycans constitute one of the most important protein modifications and identification of glycans remains a pressing problem in biology. Unfortunately, the structure of glycans is hard to predict from the genome sequence of an organism. In this paper, we consider the problem of deriving the topology of a glycan solely from tandem mass spectrometry (MS) data. We study, how to generate glycan tree candidates that sufficiently match the sample mass spectrum, avoiding the combinatorial explosion of glycan structures. Unfortunately, the resulting problem is known to be computationally hard. We present an efficient exact algorithm for this problem based on fixed-parameter algorithmics that can process a spectrum in a matter of seconds. We also report some preliminary results of our method on experimental data, combining it with a preliminary candidate evaluation scheme. We show that our approach is fast in applications, and that we can reach very well de novo identification results. Finally, we show how to count the number of glycan topologies for a fixed size or a fixed mass. We generalize this result to count the number of (labeled) trees with bounded out degree, improving on results obtained using Pólya's enumeration theorem.
聚糖是由简单糖组成的分子,形成复杂的树状结构。聚糖是最重要的蛋白质修饰之一,而糖的鉴定仍然是生物学中的一个紧迫问题。不幸的是,从生物体的基因组序列中很难预测聚糖的结构。在本文中,我们考虑仅从串联质谱(MS)数据推导出聚糖拓扑结构的问题。我们研究了如何生成足够匹配样本质谱的聚糖树候选物,避免聚糖结构的组合爆炸。不幸的是,由此产生的问题在计算上已知是困难的。我们基于固定参数算法为该问题提出了一种有效的精确算法,该算法可以在几秒钟内处理一个光谱。我们还报告了我们的方法在实验数据上的一些初步结果,将其与初步的候选物评估方案相结合。我们表明,我们的方法在应用中速度很快,并且可以达到非常好的从头鉴定结果。最后,我们展示了如何计算固定大小或固定质量的聚糖拓扑结构的数量。我们将此结果推广到具有有界出度的(标记)树的数量计数,改进了使用 Pólya 计数定理获得的结果。