Ning Kang, Ng Hoong Kee, Leong Hon Wai
Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2, 117543, Singapore.
Genome Inform. 2006;17(2):194-205.
Peptide identification by tandem mass spectrometry is both an important and challenging problem in proteomics. At present, huge amount of spectrum data are generated by high throughput mass spectrometers at a very fast pace, but algorithms to analyze these spectra are either too slow, not accurate enough, or only gives partial sequences or sequence tags. In this paper, we emphasize on the balance between identification completeness and efficiency with reasonable accuracy for peptide identification by tandem mass spectrum. Our method works by converting spectra to vectors in high-dimensional space, and subsequently use self-organizing map (SOM) and multi-point range query (MPRQ) algorithm as a coarse filter reduce the number of candidates to achieve efficient and accurate database search. Experiments show that our algorithm is both fast and accurate in peptide identification.
通过串联质谱进行肽段鉴定是蛋白质组学中一个重要且具有挑战性的问题。目前,高通量质谱仪以极快的速度生成大量的质谱数据,但用于分析这些质谱的算法要么速度太慢,不够准确,要么只能给出部分序列或序列标签。在本文中,我们强调在通过串联质谱进行肽段鉴定时,要在鉴定完整性和效率之间取得平衡,并保证合理的准确性。我们的方法是将质谱转换为高维空间中的向量,随后使用自组织映射(SOM)和多点范围查询(MPRQ)算法作为粗过滤器来减少候选肽段的数量,以实现高效且准确的数据库搜索。实验表明,我们的算法在肽段鉴定中既快速又准确。