Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA.
BMC Bioinformatics. 2011 Jun 15;12:235. doi: 10.1186/1471-2105-12-235.
Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC × GC-MS) is a powerful technique which has gained increasing attention over the last two decades. The GC × GC-MS provides much increased separation capacity, chemical selectivity and sensitivity for complex sample analysis and brings more accurate information about compound retention times and mass spectra. Despite these advantages, the retention times of the resolved peaks on the two-dimensional gas chromatographic columns are always shifted due to experimental variations, introducing difficulty in the data processing for metabolomics analysis. Therefore, the retention time variation must be adjusted in order to compare multiple metabolic profiles obtained from different conditions.
We developed novel peak alignment algorithms for both homogeneous (acquired under the identical experimental conditions) and heterogeneous (acquired under the different experimental conditions) GC × GC-MS data using modified Smith-Waterman local alignment algorithms along with mass spectral similarity. Compared with literature reported algorithms, the proposed algorithms eliminated the detection of landmark peaks and the usage of retention time transformation. Furthermore, an automated peak alignment software package was established by implementing a likelihood function for optimal peak alignment.
The proposed Smith-Waterman local alignment-based algorithms are capable of aligning both the homogeneous and heterogeneous data of multiple GC × GC-MS experiments without the transformation of retention times and the selection of landmark peaks. An optimal version of the SW-based algorithms was also established based on the associated likelihood function for the automatic peak alignment. The proposed alignment algorithms outperform the literature reported alignment method by analyzing the experiment data of a mixture of compound standards and a metabolite extract of mouse plasma with spiked-in compound standards.
全二维气相色谱-质谱联用(GC×GC-MS)是一种强大的技术,在过去二十年中受到越来越多的关注。GC×GC-MS 提供了更高的分离能力、化学选择性和灵敏度,适用于复杂样品分析,并提供了关于化合物保留时间和质谱的更准确信息。尽管有这些优势,但由于实验变化,二维气相色谱柱上解析峰的保留时间总是会发生偏移,这给代谢组学分析的数据处理带来了困难。因此,为了比较不同条件下获得的多个代谢谱,必须调整保留时间的变化。
我们开发了用于同质(在相同实验条件下获得)和异质(在不同实验条件下获得)GC×GC-MS 数据的新的峰对齐算法,使用了经过修改的 Smith-Waterman 局部对齐算法以及质谱相似性。与文献报道的算法相比,所提出的算法消除了地标峰的检测和保留时间转换的使用。此外,通过实现最优峰对齐的似然函数,建立了一个自动化的峰对齐软件包。
所提出的基于 Smith-Waterman 局部对齐的算法能够对齐多个 GC×GC-MS 实验的同质和异质数据,而无需保留时间的转换和地标峰的选择。还基于相关似然函数为自动峰对齐建立了 SW 算法的最优版本。通过分析化合物标准混合物和加标化合物标准的小鼠血浆代谢物提取物的实验数据,所提出的对齐算法优于文献报道的对齐方法。