Department of Chemistry, University of Washington , Box 351700, Seattle, Washington 98198, United States.
Pacific Northwest National Laboratory , Battelle Boulevard, P.O. Box 999, Richland, Washington 99352, United States.
Anal Chem. 2017 Mar 21;89(6):3606-3612. doi: 10.1021/acs.analchem.6b04991. Epub 2017 Feb 27.
We report a quantitative approach to optimize implementation of discovery-based software for comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC-TOFMS). The software performs a tile-based Fisher ratio (F-ratio) analysis and facilitates a supervised nontargeted analysis based upon the experimental design to aid in the discovery of analytes with statistically different variances between sample classes. The quantitative approach for software optimization uses receiver operating characteristic (ROC) curves. The area under the curve (AUC) for each ROC curve serves as a quantitative metric to optimize two key algorithm parameters: the signal-to-noise ratio (S/N) threshold of the data prior to calculating F-ratios at each m/z mass channel and the number of these F-ratios per m/z used to calculate the average F-ratio of a tile. A total of 25 combinations of S/N threshold by number of m/z were studied. Fifty analytes were spiked into a diesel fuel at two concentration levels to produce two sample classes that should in principle produce 50 positive instances in the ROC curves. The "sweet spot" for F-ratio analysis was determined to be a S/N threshold of 10 coupled with a maximum of the 10 most chemically selective m/z (requiring a minimum of 3 m/z), corresponding to an ∼21% improvement in the discrimination of true positives relative to prior studies. This equates to an additional 9 true positives being discovered at a false positive probability of 0.2 and 5 additional true positives being found overall. Furthermore, optimization of these software parameters did not depend upon a priori determination of the statistically correct number of positive instances in the sample classes. The AUC metric appears to be suitable for the evaluation of all data analysis methods that utilize the proper experimental design.
我们报告了一种定量方法,用于优化基于发现的软件在全二维气相色谱-飞行时间质谱联用(GC×GC-TOFMS)中的实现。该软件执行基于瓷砖的 Fisher 比(F-比)分析,并根据实验设计进行有监督的非靶向分析,以帮助发现样品类别之间具有统计学差异方差的分析物。软件优化的定量方法使用接收者操作特征(ROC)曲线。每个 ROC 曲线的曲线下面积(AUC)用作优化两个关键算法参数的定量指标:在每个 m/z 质量通道计算 F-比之前对数据的信噪比(S/N)阈值,以及用于计算瓷砖平均 F-比的这些 F-比的数量每个 m/z。总共研究了 S/N 阈值乘以 m/z 数量的 25 种组合。将 50 种分析物掺入柴油燃料中,在两个浓度水平下产生两个样品类别,理论上应在 ROC 曲线上产生 50 个阳性实例。F-比分析的“最佳点”确定为 S/N 阈值为 10,同时使用最多 10 个化学选择性最高的 m/z(需要至少 3 个 m/z),与之前的研究相比,对真阳性的区分度提高了约 21%。这相当于在假阳性概率为 0.2 时额外发现 9 个真阳性,总体上发现 5 个额外的真阳性。此外,这些软件参数的优化不依赖于对样品类别中统计学上正确的阳性实例数量的先验确定。AUC 指标似乎适用于所有利用正确实验设计的数据分析方法的评估。