Suppr超能文献

复杂有机混合物中的机器学习:应用领域知识可利用小数据集实现有意义的性能表现。

Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets.

作者信息

Le Katelyn, Radović Jagoš R, MacCallum Justin L, Larter Stephen R, Van Humbeck Jeffrey F

机构信息

Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada.

Center for Petroleum Geochemistry (UH-CPG), Department of Earth and Atmospheric Sciences, University of Houston, Houston, Texas 77204-5007, United States.

出版信息

J Am Chem Soc. 2024 Aug 14;146(32):22563-22569. doi: 10.1021/jacs.4c06595. Epub 2024 Jul 31.

Abstract

The ability to quantify individual components of complex mixtures is a challenge found throughout the life and physical sciences. An improved capacity to generate large data sets along with the uptake of machine-learning (ML)-based analysis tools has allowed for various "omics" disciplines to realize exceptional advances. Other areas of chemistry that deal with complex mixtures often do not leverage these advances. Environmental samples, for example, can be more difficult to access, and the resulting small data sets are less appropriate for unconstrained ML approaches. Herein, we present an approach to address this latter issue. Using a very small environmental data set─35 high-resolution mass spectra gathered from various solvent extractions of Canadian petroleum fractions─we show that the application of specific domain knowledge can lead to ML models with notable performance.

摘要

量化复杂混合物中各个成分的能力是生命科学和物理科学中普遍存在的一项挑战。随着生成大型数据集能力的提高以及基于机器学习(ML)的分析工具的采用,各种“组学”学科取得了显著进展。处理复杂混合物的化学其他领域通常并未利用这些进展。例如,环境样品可能更难获取,而且由此产生的小数据集不太适合无约束的ML方法。在此,我们提出一种方法来解决后一个问题。我们使用一个非常小的环境数据集——从加拿大石油馏分的各种溶剂萃取中收集的35个高分辨率质谱——表明应用特定领域知识可以产生具有显著性能的ML模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验