• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

可解释的机器学习模型和符号回归揭示了植物中全氟和多氟烷基物质(PFASs)的转移:一种用于扩充数据并获得预测方程的新型小数据机器学习方法。

Interpretable Machine Learning Models and Symbolic Regressions Reveal Transfer of Per- and Polyfluoroalkyl Substances (PFASs) in Plants: A New Small-Data Machine Learning Method to Augment Data and Obtain Predictive Equations.

作者信息

Zhang Yuan, Li Yanting, Li Yang, Zhao Lin, Yang Yongkui

机构信息

School of Environmental Science and Engineering, Tianjin University, Tianjin 300350, China.

Georgia Tech Shenzhen Institute, Tianjin University, Shenzhen 518071, China.

出版信息

Toxics. 2025 Jul 10;13(7):579. doi: 10.3390/toxics13070579.

DOI:10.3390/toxics13070579
PMID:40711024
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12300769/
Abstract

Machine learning (ML) techniques are becoming increasingly valuable for modeling the transport of pollutants in plant systems. However, two challenges (small sample sizes and a lack of quantitative calculation functions) remain when using ML to predict migration in hydroponic systems. For the bioaccumulation of per- and polyfluoroalkyl substances, we studied the key factors and quantitative calculation equations based on data augmentation, ML, and symbolic regression. First, feature expansion was performed on the input data after data preprocessing; the most important step was data augmentation. The original training set was expanded nine times by combining the synthetic minority oversampling technique and a variational autoencoder. Subsequently, the four ML models were applied to the test set to predict the selected output parameters. Categorical boosting (CatBoost) had the highest prediction accuracy ( = 0.83). The Shapley Additive Explanation values indicated that molecular weight and exposure time were the most important parameters. We applied three symbolic regression models to obtain accurate prediction equations based on the original and augmented data. Based on augmented data, the high-dimensional sparse interaction equation exhibited the highest accuracy ( = 0.776). Our results indicate that this method could provide crucial insights into absorption and accumulation in plant roots.

摘要

机器学习(ML)技术在模拟植物系统中污染物的迁移方面正变得越来越有价值。然而,在使用ML预测水培系统中的迁移时,仍然存在两个挑战(样本量小和缺乏定量计算功能)。对于全氟和多氟烷基物质的生物累积,我们基于数据增强、ML和符号回归研究了关键因素和定量计算方程。首先,在数据预处理后对输入数据进行特征扩展;最重要的步骤是数据增强。通过结合合成少数过采样技术和变分自编码器,将原始训练集扩展了九倍。随后,将四个ML模型应用于测试集以预测选定的输出参数。分类提升(CatBoost)具有最高的预测准确率( = 0.83)。Shapley值表明分子量和暴露时间是最重要的参数。我们应用了三个符号回归模型,以基于原始数据和增强数据获得准确的预测方程。基于增强数据,高维稀疏相互作用方程表现出最高的准确率( = 0.776)。我们的结果表明,该方法可以为植物根系的吸收和积累提供关键见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/9e28b78d236d/toxics-13-00579-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/2a71951d023e/toxics-13-00579-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/6c26cc7e2c03/toxics-13-00579-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/e39c5fa0034d/toxics-13-00579-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/3172c217a67f/toxics-13-00579-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/e67f31a901c6/toxics-13-00579-g005a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/9e28b78d236d/toxics-13-00579-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/2a71951d023e/toxics-13-00579-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/6c26cc7e2c03/toxics-13-00579-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/e39c5fa0034d/toxics-13-00579-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/3172c217a67f/toxics-13-00579-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/e67f31a901c6/toxics-13-00579-g005a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63dd/12300769/9e28b78d236d/toxics-13-00579-g006.jpg

相似文献

1
Interpretable Machine Learning Models and Symbolic Regressions Reveal Transfer of Per- and Polyfluoroalkyl Substances (PFASs) in Plants: A New Small-Data Machine Learning Method to Augment Data and Obtain Predictive Equations.可解释的机器学习模型和符号回归揭示了植物中全氟和多氟烷基物质(PFASs)的转移:一种用于扩充数据并获得预测方程的新型小数据机器学习方法。
Toxics. 2025 Jul 10;13(7):579. doi: 10.3390/toxics13070579.
2
[Fast determination of per- and polyfluoroalkyl substances in human serum by cold-induced phase separation coupled with liquid chromatography-tandem mass spectrometry].[冷诱导相分离结合液相色谱-串联质谱法快速测定人血清中的全氟和多氟烷基物质]
Se Pu. 2025 Jul;43(7):756-766. doi: 10.3724/SP.J.1123.2024.11028.
3
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
4
Optimized feature selection and advanced machine learning for stroke risk prediction in revascularized coronary artery disease patients.优化特征选择与先进机器学习用于预测冠状动脉疾病血运重建患者的卒中风险
BMC Med Inform Decis Mak. 2025 Jul 24;25(1):276. doi: 10.1186/s12911-025-03116-2.
5
Approaches for predicting dairy cattle methane emissions: from traditional methods to machine learning.预测奶牛甲烷排放的方法:从传统方法到机器学习。
J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae219.
6
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
7
Prediction of bioconcentration factors (BCFs) and bioaccumulation factors (BAFs) for per- and polyfluoroalkyl substances (PFASs) using Read-Across and q-RASPR.使用交叉参照法和q-RASPR预测全氟和多氟烷基物质(PFASs)的生物富集因子(BCFs)和生物累积因子(BAFs)
Sci Total Environ. 2025 Sep 1;993:180007. doi: 10.1016/j.scitotenv.2025.180007. Epub 2025 Jul 5.
8
Exploring the relationship between per- and polyfluoroalkyl substances exposure and rheumatoid arthritis risk using interpretable machine learning.使用可解释的机器学习探索全氟和多氟烷基物质暴露与类风湿性关节炎风险之间的关系。
Front Public Health. 2025 Jun 3;13:1581717. doi: 10.3389/fpubh.2025.1581717. eCollection 2025.
9
Development of a machine learning model and a web application for predicting neurological outcome at hospital discharge in spinal cord injury patients.开发用于预测脊髓损伤患者出院时神经功能结局的机器学习模型和网络应用程序。
Spine J. 2025 Jan 31. doi: 10.1016/j.spinee.2025.01.005.
10
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.

本文引用的文献

1
Explainable AI-based risk assessment for pluvial floods over South Korea.基于可解释人工智能的韩国暴雨洪水风险评估
J Environ Manage. 2025 Jun;385:125640. doi: 10.1016/j.jenvman.2025.125640. Epub 2025 May 6.
2
Modeling PFAS Sorption in Soils Using Machine Learning.利用机器学习对土壤中全氟和多氟烷基物质吸附进行建模
Environ Sci Technol. 2025 Apr 22;59(15):7678-7687. doi: 10.1021/acs.est.4c13284. Epub 2025 Apr 11.
3
Addressing the Data Scarcity Problem in Ecotoxicology via Small Data Machine Learning Methods.通过小数据机器学习方法解决生态毒理学中的数据稀缺问题。
Environ Sci Technol. 2025 Apr 1;59(12):5867-5871. doi: 10.1021/acs.est.5c00510. Epub 2025 Mar 20.
4
Enhancing hydrogen sulfide control in urban sewer systems using machine learning models: Development of a new predictive simulation approach by using boosting algorithm.使用机器学习模型加强城市下水道系统中的硫化氢控制:一种基于提升算法的新型预测模拟方法的开发。
J Hazard Mater. 2025 Jul 5;491:137906. doi: 10.1016/j.jhazmat.2025.137906. Epub 2025 Mar 11.
5
Estimation of unrealized forest carbon potential in China using time-varying Boruta-SHAP-random forest model and climate vegetation productivity index.利用时变Boruta-SHAP-随机森林模型和气候植被生产力指数估算中国未实现的森林碳潜力
J Environ Manage. 2025 Mar;377:124649. doi: 10.1016/j.jenvman.2025.124649. Epub 2025 Feb 22.
6
Emerging PFAS Exposure Is More Potent in Altering Childhood Lipid Levels Mediated by Mitochondrial DNA Copy Number.新出现的全氟和多氟烷基物质暴露在改变由线粒体DNA拷贝数介导的儿童血脂水平方面更具效力。
Environ Sci Technol. 2025 Feb 11;59(5):2484-2493. doi: 10.1021/acs.est.4c13095. Epub 2025 Feb 3.
7
Large language models: Tools for new environmental decision-making.大语言模型:新环境决策的工具。
J Environ Manage. 2025 Feb;375:124373. doi: 10.1016/j.jenvman.2025.124373. Epub 2025 Feb 1.
8
Uptake of per- and polyfluoroalkyl substances by Conservation Reserve Program's seed mix in biosolids-amended soil.在添加生物固体的土壤中,保护储备计划的种子混合物对全氟和多氟烷基物质的吸收。
Environ Pollut. 2024 Dec 15;363(Pt 2):125235. doi: 10.1016/j.envpol.2024.125235. Epub 2024 Nov 2.
9
Perfluorooctanoic Acids (PFOA) removal using electrochemical oxidation: A machine learning approach.电化学氧化去除全氟辛酸 (PFOA):一种机器学习方法。
J Environ Manage. 2024 Nov;370:122857. doi: 10.1016/j.jenvman.2024.122857. Epub 2024 Oct 11.
10
Per- and Polyfluoroalkyl Substances (PFAS) Affect Female Reproductive Health: Epidemiological Evidence and Underlying Mechanisms.全氟和多氟烷基物质(PFAS)对女性生殖健康的影响:流行病学证据及潜在机制
Toxics. 2024 Sep 18;12(9):678. doi: 10.3390/toxics12090678.