DSM Resolve, Dept. Process Analysis & Statistics, Geleen, The Netherlands.
Anal Chim Acta. 2012 May 13;726:9-21. doi: 10.1016/j.aca.2012.03.009. Epub 2012 Mar 15.
Comprehensive two-dimensional gas chromatography coupled to mass spectrometry is a powerful tool to analyze complex samples. For application of the technique in studies like biomarker discovery in which large sets of complex samples have to be analyzed, extensive preprocessing is needed to align the data obtained in several injections (analyses). We developed new alignment and clustering algorithms for this type of data. New in the current procedures is the consistent way in which the phenomenon referred to as wrap-around is treated. The data analysis problems associated with this phenomenon are solved by treating the 2D display as the surface of a three-dimensional cylinder. Based on this transformation we developed a new similarity metric for features as a function of both the cylindrical distance (reflecting similarity in chromatographic behavior) and of the mass spectral correlation (reflecting similarity in chemical structure). The concepts are used in warping and clustering, and include a protection against greedy warping. The methods were applied - for the purpose of an example - to the analysis of 11 replicates of a human urine sample concentrated by solid phase extraction. It is shown that the alignment is well protected against greedy warping which is important with respect to analytical qualities as robustness and repeatability. It is also demonstrated that chemically similar features are clustered together. The paper is organized as follows. First a brief introduction is provided addressing the background of the GC×GC-MS data structure followed by a theoretical section with a conceptual description of the procedures and details of the algorithms. Finally an example is given in the experimental section, illustrating the application of the procedures.
全二维气相色谱-质谱联用是分析复杂样品的有力工具。对于像生物标志物发现这样的技术应用,需要对大量复杂的样品进行分析,因此需要广泛的预处理来对齐在几次进样(分析)中获得的数据。我们为这种类型的数据开发了新的对齐和聚类算法。当前程序中的新内容是一致处理所谓的环绕现象的方式。通过将二维显示视为三维圆柱体的表面,解决了与该现象相关的数据分析问题。基于这种变换,我们开发了一种新的特征相似性度量,作为圆柱距离(反映色谱行为的相似性)和质谱相关性(反映化学结构的相似性)的函数。这些概念用于变形和聚类,并包括防止贪婪变形的保护措施。该方法应用于固相萃取浓缩的 11 个人类尿液样本的分析实例,结果表明对齐很好地防止了贪婪变形,这对于分析质量(如稳健性和重复性)很重要。还证明了化学相似的特征被聚类在一起。本文的组织如下。首先简要介绍了 GC×GC-MS 数据结构的背景,然后是理论部分,对程序和算法的细节进行了概念性描述。最后在实验部分给出了一个实例,说明了该程序的应用。