Harwood Thomas V, Treen Daniel G C, Wang Mingxun, de Jong Wibe, Northen Trent R, Bowen Benjamin P
Environmental Genomics and Systems Biology Division, The DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA, 94720, USA.
Department of Computer Science and Engineering, University of California Riverside, 900 University Avenue, Riverside, CA, 92521, USA.
Sci Rep. 2023 Aug 18;13(1):13462. doi: 10.1038/s41598-023-40496-9.
Metabolomics has a long history of using cosine similarity to match experimental tandem mass spectra to databases for compound identification. Here we introduce the Blur-and-Link (BLINK) approach for scoring cosine similarity. By bypassing fragment alignment and simultaneously scoring all pairs of spectra using sparse matrix operations, BLINK is over 3000 times faster than MatchMS, a widely used loop-based alignment and scoring implementation. Using a similarity cutoff of 0.7, BLINK and MatchMS had practically equivalent identification agreement, and greater than 99% of their scores and matching ion counts were identical. This performance improvement can enable calculations to be performed that would typically be limited by time and available computational resources.
代谢组学长期以来一直使用余弦相似度将实验串联质谱与数据库进行匹配以识别化合物。在此,我们介绍一种用于计算余弦相似度得分的模糊关联(BLINK)方法。通过绕过碎片比对并使用稀疏矩阵运算同时对所有光谱对进行评分,BLINK的速度比MatchMS快3000多倍,MatchMS是一种广泛使用的基于循环的比对和评分工具。使用0.7的相似度阈值时,BLINK和MatchMS的识别一致性几乎相同,且它们超过99%的得分和匹配离子计数是相同的。这种性能提升能够实现那些通常受时间和可用计算资源限制而无法进行的计算。