Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky 40292, USA.
Anal Chem. 2012 Aug 7;84(15):6477-87. doi: 10.1021/ac301350n. Epub 2012 Jul 26.
Compound identification is a key component of data analysis in the applications of gas chromatography-mass spectrometry (GC-MS). Currently, the most widely used compound identification is mass spectrum matching, in which the dot product and its composite version are employed as spectral similarity measures. Several forms of transformations for fragment ion intensities have also been proposed to increase the accuracy of compound identification. In this study, we introduced partial and semipartial correlations as mass spectral similarity measures and applied them to identify compounds along with different transformations of peak intensity. The mixture versions of the proposed method were also developed to further improve the accuracy of compound identification. To demonstrate the performance of the proposed spectral similarity measures, the National Institute of Standards and Technology (NIST) mass spectral library and replicate spectral library were used as the reference library and the query spectra, respectively. Identification results showed that the mixture partial and semipartial correlations always outperform both the dot product and its composite measure. The mixture similarity with semipartial correlation has the highest accuracy of 84.6% in compound identification with a transformation of (0.53,1.3) for fragment ion intensity and m/z value, respectively.
化合物鉴定是气相色谱-质谱联用(GC-MS)应用中数据分析的关键组成部分。目前,最广泛使用的化合物鉴定方法是质谱匹配,其中点积及其复合版本被用作光谱相似性度量。还提出了几种形式的碎片离子强度变换,以提高化合物鉴定的准确性。在这项研究中,我们引入了部分和半部分相关性作为质谱相似性度量,并将其应用于识别化合物以及不同形式的峰强度变换。还开发了混合版本的建议方法,以进一步提高化合物鉴定的准确性。为了演示所提出的光谱相似性度量的性能,使用了国家标准与技术研究所(NIST)质谱库和重复光谱库作为参考库和查询光谱。鉴定结果表明,混合部分和半部分相关性总是优于点积及其复合度量。在使用(0.53,1.3)分别对碎片离子强度和 m/z 值进行变换的情况下,具有半部分相关性的混合相似性的化合物识别准确率最高,达到 84.6%。