Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA.
Bioinformatics. 2012 Apr 15;28(8):1158-63. doi: 10.1093/bioinformatics/bts083. Epub 2012 Feb 13.
The compound identification in gas chromatography-mass spectrometry (GC-MS) is achieved by matching the experimental mass spectrum to the mass spectra in a spectral library. It is known that the intensities with higher m/z value in the GC-MS mass spectrum are the most diagnostic. Therefore, to increase the relative significance of peak intensities of higher m/z value, the intensities and m/z values are usually transformed with a set of weight factors. A poor quality of weight factors can significantly decrease the accuracy of compound identification. With the significant enrichment of the mass spectral database and the broad application of GC-MS, it is important to re-visit the methods of discovering the optimal weight factors for high confident compound identification.
We developed a novel approach to finding the optimal weight factors only through a reference library for high accuracy compound identification. The developed approach first calculates the ratio of skewness to kurtosis of the mass spectral similarity scores among spectra (compounds) in a reference library and then considers a weight factor with the maximum ratio as the optimal weight factor. We examined our approach by comparing the accuracy of compound identification using the mass spectral library maintained by the National Institute of Standards and Technology. The results demonstrate that the optimal weight factors for fragment ion peak intensity and m/z value found by the developed approach outperform the current weight factors for compound identification.
The results and R package are available at http://stage.louisville.edu/faculty/x0zhan17/software/ software-development.
在气相色谱-质谱联用仪(GC-MS)中,通过将实验质谱与光谱库中的质谱进行匹配来实现化合物的鉴定。已知 GC-MS 质谱中较高 m/z 值的强度最具诊断意义。因此,为了增加较高 m/z 值的峰强度的相对重要性,通常使用一组权重因子对强度和 m/z 值进行转换。权重因子的质量较差会显著降低化合物鉴定的准确性。随着质谱数据库的显著丰富和 GC-MS 的广泛应用,重新探讨发现高置信化合物鉴定的最佳权重因子的方法变得至关重要。
我们开发了一种仅通过参考库寻找最佳权重因子的新方法,以实现高精度化合物鉴定。该方法首先计算参考库中光谱(化合物)之间质谱相似性得分的偏度与峰度比,然后考虑最大比值的权重因子作为最佳权重因子。我们通过比较使用美国国家标准与技术研究院维护的质谱库进行化合物鉴定的准确性来检验我们的方法。结果表明,该方法找到的碎片离子峰强度和 m/z 值的最佳权重因子优于当前化合物鉴定的权重因子。
结果和 R 包可在 http://stage.louisville.edu/faculty/x0zhan17/software/software-development 上获得。