复杂有机混合物中的机器学习：应用领域知识可利用小数据集实现有意义的性能表现。

Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets.

作者信息

Le Katelyn, Radović Jagoš R, MacCallum Justin L, Larter Stephen R, Van Humbeck Jeffrey F

机构信息

Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada.

Center for Petroleum Geochemistry (UH-CPG), Department of Earth and Atmospheric Sciences, University of Houston, Houston, Texas 77204-5007, United States.

出版信息

J Am Chem Soc. 2024 Aug 14;146(32):22563-22569. doi: 10.1021/jacs.4c06595. Epub 2024 Jul 31.

DOI:10.1021/jacs.4c06595

PMID:39082215

Abstract

The ability to quantify individual components of complex mixtures is a challenge found throughout the life and physical sciences. An improved capacity to generate large data sets along with the uptake of machine-learning (ML)-based analysis tools has allowed for various "omics" disciplines to realize exceptional advances. Other areas of chemistry that deal with complex mixtures often do not leverage these advances. Environmental samples, for example, can be more difficult to access, and the resulting small data sets are less appropriate for unconstrained ML approaches. Herein, we present an approach to address this latter issue. Using a very small environmental data set─35 high-resolution mass spectra gathered from various solvent extractions of Canadian petroleum fractions─we show that the application of specific domain knowledge can lead to ML models with notable performance.

摘要

量化复杂混合物中各个成分的能力是生命科学和物理科学中普遍存在的一项挑战。随着生成大型数据集能力的提高以及基于机器学习（ML）的分析工具的采用，各种“组学”学科取得了显著进展。处理复杂混合物的化学其他领域通常并未利用这些进展。例如，环境样品可能更难获取，而且由此产生的小数据集不太适合无约束的ML方法。在此，我们提出一种方法来解决后一个问题。我们使用一个非常小的环境数据集——从加拿大石油馏分的各种溶剂萃取中收集的35个高分辨率质谱——表明应用特定领域知识可以产生具有显著性能的ML模型。

相似文献

Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets.复杂有机混合物中的机器学习：应用领域知识可利用小数据集实现有意义的性能表现。

J Am Chem Soc. 2024 Aug 14;146(32):22563-22569. doi: 10.1021/jacs.4c06595. Epub 2024 Jul 31.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Personal exposure to mixtures of volatile organic compounds: modeling and further analysis of the RIOPA data.个人对挥发性有机化合物混合物的暴露：RIOPA数据的建模与进一步分析

Res Rep Health Eff Inst. 2014 Jun(181):3-63.

Fully Automated Unconstrained Analysis of High-Resolution Mass Spectrometry Data with Machine Learning.基于机器学习的高分辨率质谱数据全自动非约束分析。

J Am Chem Soc. 2022 Aug 17;144(32):14590-14606. doi: 10.1021/jacs.2c03631. Epub 2022 Aug 8.

Machine Learning-Assisted QSAR Models on Contaminant Reactivity Toward Four Oxidants: Combining Small Data Sets and Knowledge Transfer.机器学习辅助的关于四种氧化剂污染物反应性的定量构效关系模型：小数据集的组合和知识转移。

Environ Sci Technol. 2022 Jan 4;56(1):681-692. doi: 10.1021/acs.est.1c04883. Epub 2021 Dec 15.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象：化学与物理邂逅生物学（瑞士阿斯科纳，2012年6月10日至14日）

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Sulfur organic compounds in bottom sediments of the eastern Gulf of Finland.芬兰湾东部底部沉积物中的硫有机化合物。

Environ Sci Pollut Res Int. 2007 Sep;14(6):366-76. doi: 10.1065/espr2006.08.334.

引用本文的文献

Bayesian Meta-Learning for Few-Shot Reaction Outcome Prediction of Asymmetric Hydrogenation of Olefins.用于烯烃不对称氢化少样本反应结果预测的贝叶斯元学习

Angew Chem Int Ed Engl. 2025 Jul;64(27):e202503821. doi: 10.1002/anie.202503821. Epub 2025 May 2.

Machine Learning Framework for Conotoxin Class and Molecular Target Prediction.用于 Conotoxin 类和分子靶标预测的机器学习框架。

Toxins (Basel). 2024 Nov 3;16(11):475. doi: 10.3390/toxins16110475.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

复杂有机混合物中的机器学习：应用领域知识可利用小数据集实现有意义的性能表现。

Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献