Suppr超能文献

化合物鉴定的质谱文库搜索算法的优化和测试。

Optimization and testing of mass spectral library search algorithms for compound identification.

机构信息

Atmospheric Research and Exposure Assessment Laboratory, U. S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA.

出版信息

J Am Soc Mass Spectrom. 1994 Sep;5(9):859-66. doi: 10.1016/1044-0305(94)87009-8.

Abstract

Five algorithms proposed in the literature for library search identification of unknown compounds from their low resolution mass spectra were optimized and tested by matching test spectra against reference spectra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were probability-based matching (PBM), dot-product, Hertz et al. similarity index, Euclidean distance, and absolute value distance. The test set consisted of 12,592 alternate spectra of about 8000 compounds represented in the database. Most algorithms were optimized by varying their mass weighting and intensity scaling factors. Rank in the list of candidatc compounds was used as the criterion for accuracy. The best performing algorithm (75% accuracy for rank 1) was the dot-product function that measures the cosine of the angle between spectra represented as vectors. Other methods in order of performance were the Euclidean distance (72%), absolute value distance (68%) PBM (65%), and Hertz et al. (64%). Intensity scaling and mass weighting were important in the optimized algorithms with the square root of the intensity scale nearly optimal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the results. A modest improvement in the performance of the dot-product algorithm was made by adding a term that gave additional weight to relative peak intensities for spectra with many peaks in common.

摘要

五种用于从低分辨质谱中鉴定未知化合物的文献中提出的库搜索算法经过优化,并通过将测试谱与 NIST-EPA-NIH 质谱数据库中的参考谱进行匹配测试。这些算法是基于概率的匹配(PBM)、点积、Hertz 等人的相似度指数、欧几里得距离和绝对值距离。测试集由大约 8000 种化合物的 12592 个交替谱组成。大多数算法通过改变其质量加权和强度缩放因子进行优化。候选化合物列表中的排名被用作准确性的标准。表现最好的算法(排名 1 的准确率为 75%)是点积函数,它测量作为向量表示的光谱之间的夹角余弦。其他性能顺序的方法是欧几里得距离(72%)、绝对值距离(68%)、PBM(65%)和 Hertz 等人(64%)。强度缩放和质量加权在优化算法中很重要,强度尺度的平方根几乎是最优的,质量加权的最佳幂次是平方或立方。还测试了几种更复杂的方案,但对结果影响不大。通过添加一个项,为具有许多共同峰的光谱的相对峰强度赋予额外的权重,对点积算法的性能略有提高。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验