基于图块的方差排序增强全面二维气相色谱飞行时间质谱数据的偏最小二乘建模。

Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking.

机构信息

Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA.

出版信息

J Chromatogr A. 2023 Apr 12;1694:463920. doi: 10.1016/j.chroma.2023.463920. Epub 2023 Mar 11.

DOI:10.1016/j.chroma.2023.463920

Abstract

Chemometric methods like partial least squares (PLS) regression are valuable for correlating sample-based differences hidden in comprehensive two-dimensional gas chromatography (GC × GC) data to independently measured physicochemical properties. Herein, this work establishes the first implementation of tile-based variance ranking as a selective data reduction methodology to improve PLS modeling performance of 58 diverse aerospace fuels. Tile-based variance ranking discovered a total of 521 analytes with a square of the relative standard deviation (RSD) in signal between 0.07 to 22.84. The goodness-of-fit for the models were determined by their normalized root-mean-square error of cross-validation (NRMSECV) and normalized root-mean-square error of prediction (NRMSEP). PLS models developed for viscosity, hydrogen content, and heat of combustion using all 521 features discovered by tile-based variance ranking had a respective NRMSECV (NRMSEP) equal to 10.5 % (10.2 %), 8.3 % (7.6 %), and 13.1 % (13.5 %). In contrast, use of a single-grid binning scheme, a common data reduction strategy for PLS analysis, resulted in less accurate models for viscosity (NRMSECV = 14.2 %; NRMSEP = 14.3 %), hydrogen content (NRMSECV = 12.1 %; NRMSEP = 11.0 %), and heat of combustion (NRMSECV = 14.4 %; NRMSEP = 13.6 %). Further, the features discovered by tile-based variance ranking can be optimized for each PLS model with RReliefF analysis, a machine learning algorithm. RReliefF feature optimization selected 48, 125, and 172 analytes out of the original 521 discovered by tile-based variance ranking to model viscosity, hydrogen content, and heat of combustion, respectively. The RReliefF optimized features developed highly accurate property-composition models for viscosity (NRMSECV = 7.9 %; NRMSEP = 5.8 %), hydrogen content (NRMSECV = 7.0 %; NRMSEP = 4.9 %), heat of combustion (NRMSECV = 7.9 %; NRMSEP = 8.4 %). This work also demonstrates that processing the chromatograms with a tile-based approach allows the analyst to directly identify the analytes of importance in a PLS model. Coupling tile-based feature selection with PLS analysis allows for deeper understanding in any property-composition study.

摘要

化学计量学方法，如偏最小二乘法（PLS）回归，对于将隐藏在全二维气相色谱（GC×GC）数据中的基于样本的差异与独立测量的物理化学性质相关联非常有用。在此，本工作首次建立了基于平铺的方差排序作为一种选择性数据减少方法，以提高 58 种不同航空航天燃料的 PLS 建模性能。基于平铺的方差排序共发现了 521 种分析物，其信号的相对标准偏差（RSD）平方在 0.07 到 22.84 之间。通过其归一化交叉验证均方根误差（NRMSECV）和归一化预测均方根误差（NRMSEP）来确定模型的拟合优度。使用基于平铺的方差排序发现的总共 521 个特征开发的用于粘度、氢含量和燃烧热的 PLS 模型的 NRMSECV（NRMSEP）分别等于 10.5%（10.2%）、8.3%（7.6%）和 13.1%（13.5%）。相比之下，使用单网格分箱方案（PLS 分析的常用数据减少策略）会导致粘度模型的精度较低（NRMSECV=14.2%；NRMSEP=14.3%）、氢含量模型（NRMSECV=12.1%；NRMSEP=11.0%）和燃烧热模型（NRMSECV=14.4%；NRMSEP=13.6%）。此外，可以使用机器学习算法 RReliefF 分析对基于平铺的方差排序发现的特征进行优化，以用于每个 PLS 模型。RReliefF 特征优化从基于平铺的方差排序发现的 521 个原始特征中分别选择了 48、125 和 172 个分析物来建模粘度、氢含量和燃烧热。经过 RReliefF 优化的特征为粘度（NRMSECV=7.9%；NRMSEP=5.8%）、氢含量（NRMSECV=7.0%；NRMSEP=4.9%）和燃烧热（NRMSECV=7.9%；NRMSEP=8.4%）建立了高度准确的性质-组成模型。本工作还表明，使用基于平铺的方法处理色谱图可使分析人员能够直接识别 PLS 模型中的重要分析物。将基于平铺的特征选择与 PLS 分析相结合，可以更深入地了解任何性质-组成研究。

相似文献

Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking.基于图块的方差排序增强全面二维气相色谱飞行时间质谱数据的偏最小二乘建模。

J Chromatogr A. 2023 Apr 12;1694:463920. doi: 10.1016/j.chroma.2023.463920. Epub 2023 Mar 11.

Fuel property modeling by high-speed gas chromatography coupled with partial least squares data analysis.高速气相色谱法与偏最小二乘数据分析相结合的燃料性质建模。

J Chromatogr A. 2024 Sep 13;1732:465220. doi: 10.1016/j.chroma.2024.465220. Epub 2024 Jul 31.

Correlation of rocket propulsion fuel properties with chemical composition using comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry followed by partial least squares regression analysis.采用全二维气相色谱飞行时间质谱联用技术结合偏最小二乘回归分析，研究火箭推进剂燃料特性与化学成分的相关性。

J Chromatogr A. 2014 Jan 31;1327:132-40. doi: 10.1016/j.chroma.2013.12.060. Epub 2013 Dec 30.

Modeling RP-1 fuel advanced distillation data using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry and partial least squares analysis.使用全二维气相色谱-飞行时间质谱联用技术和偏最小二乘法分析对RP-1燃料的先进蒸馏数据进行建模。

Anal Bioanal Chem. 2015 Jan;407(1):321-30. doi: 10.1007/s00216-014-8233-6. Epub 2014 Oct 15.

Using solid-phase extraction to facilitate a focused tile-based Fisher ratio analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data: comparative analysis of aerospace fuel composition.使用固相萃取促进对全二维气相色谱-飞行时间质谱数据进行基于分块的聚焦费舍尔比率分析：航空燃料成分的比较分析

Anal Bioanal Chem. 2023 May;415(13):2411-2423. doi: 10.1007/s00216-022-04348-1. Epub 2022 Oct 1.

Tile-based variance rank initiated-unsupervised sample indexing for comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry.基于瓦片的方差等级排序引导的无监督样本索引用于全面二维气相色谱飞行时间质谱法。

Anal Chim Acta. 2022 May 29;1209:339847. doi: 10.1016/j.aca.2022.339847. Epub 2022 Apr 19.

Simulating comprehensive two-dimensional gas chromatography mass spectrometry data with realistic run-to-run shifting to evaluate the robustness of tile-based Fisher ratio analysis.使用具有实际运行到运行偏移的综合二维气相色谱质谱数据进行模拟，以评估基于图块的 Fisher 比分析的稳健性。

J Chromatogr A. 2022 Aug 16;1677:463321. doi: 10.1016/j.chroma.2022.463321. Epub 2022 Jul 10.

Investigation of the limit of discovery using tile-based Fisher ratio analysis with comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry.利用基于平铺的 Fisher 比分析结合全二维气相色谱飞行时间质谱技术探测限的研究。

J Chromatogr A. 2021 May 10;1644:462092. doi: 10.1016/j.chroma.2021.462092. Epub 2021 Mar 22.

Comprehensive two-dimensional gas chromatography in combination with pixel-based analysis for fouling tendency prediction.结合基于像素分析的全二维气相色谱法用于污垢倾向预测。

J Chromatogr A. 2017 Jun 9;1501:89-98. doi: 10.1016/j.chroma.2017.04.021. Epub 2017 Apr 14.

Minimum variance optimized Fisher ratio analysis of comprehensive two-dimensional gas chromatography / mass spectrometry data: Study of the pacu fish metabolome.最小方差优化 Fisher 比分析在全二维气相色谱/质谱联用数据分析中的应用：帕库鱼代谢组学研究。

J Chromatogr A. 2022 Mar 29;1667:462868. doi: 10.1016/j.chroma.2022.462868. Epub 2022 Feb 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于图块的方差排序增强全面二维气相色谱飞行时间质谱数据的偏最小二乘建模。

Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking.

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献