Department of Ocean Science and Center for Ocean Research in Hong Kong and Macau, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong SAR, China.
School of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China.
Environ Sci Technol. 2023 Nov 21;57(46):17900-17909. doi: 10.1021/acs.est.3c00221. Epub 2023 Apr 20.
Dissolved organic matter (DOM) is a complex mixture of molecules that constitutes one of the largest reservoirs of organic matter on Earth. While stable carbon isotope values (δC) provide valuable insights into DOM transformations from land to ocean, it remains unclear how individual molecules respond to changes in DOM properties such as δC. To address this, we employed Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) to characterize the molecular composition of DOM in 510 samples from the China Coastal Environments, with 320 samples having δC measurements. Utilizing a machine learning model based on 5199 molecular formulas, we predicted δC values with a mean absolute error (MAE) of 0.30‰ on the training data set, surpassing traditional linear regression methods (MAE 0.85‰). Our findings suggest that degradation processes, microbial activities, and primary production regulate DOM from rivers to the ocean continuum. Additionally, the machine learning model accurately predicted δC values in samples without known δC values and in other published data sets, reflecting the δC trend along the land to ocean continuum. This study demonstrates the potential of machine learning to capture the complex relationships between DOM composition and bulk parameters, particularly with larger learning data sets and increasing molecular research in the future.
溶解有机质 (DOM) 是一种复杂的分子混合物,构成了地球上最大的有机质储存库之一。尽管稳定的碳同位素值 (δC) 为 DOM 从陆地到海洋的转化提供了有价值的见解,但仍不清楚单个分子如何响应 DOM 特性(如 δC)的变化。为了解决这个问题,我们采用傅里叶变换离子回旋共振质谱 (FT-ICR MS) 对来自中国沿海环境的 510 个样本中的 DOM 分子组成进行了表征,其中 320 个样本具有 δC 测量值。利用基于 5199 个分子公式的机器学习模型,我们在训练数据集上预测 δC 值的平均绝对误差 (MAE) 为 0.30‰,超过了传统的线性回归方法 (MAE 为 0.85‰)。我们的研究结果表明,降解过程、微生物活动和初级生产调节了从河流到海洋连续体的 DOM。此外,机器学习模型还准确地预测了没有已知 δC 值的样本和其他已发表数据集的 δC 值,反映了沿陆地到海洋连续体的 δC 趋势。这项研究表明,机器学习有潜力捕捉 DOM 组成与总体参数之间的复杂关系,特别是在更大的学习数据集和未来分子研究增加的情况下。