Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA.
Bioinformatics. 2011 Jun 15;27(12):1660-6. doi: 10.1093/bioinformatics/btr188. Epub 2011 Apr 14.
Comprehensive two-dimensional gas chromatography mass spectrometry (GC × GC-MS) brings much increased separation capacity, chemical selectivity and sensitivity for metabolomics and provides more accurate information about metabolite retention times and mass spectra. However, there is always a shift of retention times in the two columns that makes it difficult to compare metabolic profiles obtained from multiple samples exposed to different experimental conditions.
The existing peak alignment algorithms for GC × GC-MS data use the peak distance and the spectra similarity sequentially and require predefined either distance-based window and/or spectral similarity-based window. To overcome the limitations of the current alignment methods, we developed an optimal peak alignment using a novel mixture similarity by employing the peak distance and the spectral similarity measures simultaneously without any variation windows. In addition, we examined the effect of the four different distance measures such as Euclidean, Maximum, Manhattan and Canberra distances on the peak alignment. The performance of our proposed peak alignment algorithm was compared with the existing alignment methods on the two sets of GC × GC-MS data. Our analysis showed that Canberra distance performed better than other distances and the proposed mixture similarity peak alignment algorithm prevailed against all literature reported methods.
The data and software mSPA are available at http://stage.louisville.edu/faculty/x0zhan17/software/software-development.
全面的二维气相色谱 - 质谱联用(GC×GC-MS)为代谢组学带来了更高的分离能力、化学选择性和灵敏度,并提供了关于代谢物保留时间和质谱的更准确信息。然而,在两个柱子中总是存在保留时间的偏移,这使得难以比较暴露于不同实验条件的多个样品获得的代谢谱。
现有的 GC×GC-MS 数据峰对齐算法依次使用峰距离和光谱相似度,并需要预定义基于距离的窗口和/或基于光谱相似度的窗口。为了克服当前对齐方法的局限性,我们开发了一种使用新颖的混合物相似度的最优峰对齐,同时使用峰距离和光谱相似度度量,而无需任何变化窗口。此外,我们还研究了四种不同距离度量(欧几里得、最大、曼哈顿和堪培拉距离)对峰对齐的影响。我们提出的峰对齐算法的性能与两组 GC×GC-MS 数据上的现有对齐方法进行了比较。我们的分析表明,堪培拉距离的性能优于其他距离,并且提出的混合物相似度峰对齐算法优于所有文献报道的方法。
数据和软件 mSPA 可在 http://stage.louisville.edu/faculty/x0zhan17/software/software-development 上获得。