• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于生物聚合过程控制策略开发的数据增强和机器学习技术。

Data augmentation and machine learning techniques for control strategy development in bio-polymerization process.

作者信息

Wei Sizhou, Chen Zhiyuan, Arumugasamy Senthil Kumar, Chew Irene Mei Leng

机构信息

School of Computer Science, University of Nottingham, Nottingham, NG8 1BB, United Kingdom.

School of Computer Science, University of Nottingham Malaysia, Semenyih, 43500, Malaysia.

出版信息

Environ Sci Ecotechnol. 2022 Apr 20;11:100172. doi: 10.1016/j.ese.2022.100172. eCollection 2022 Jul.

DOI:10.1016/j.ese.2022.100172
PMID:36158757
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9357554/
Abstract

Machine learning has been increasingly used in biochemistry. However, in organic chemistry and other experiment-based fields, data collected from real experiments are inadequate and the current coronavirus disease (COVID-19) pandemic has made the situation even worse. Such limited data resources may result in the low performance of modeling and affect the proper development of a control strategy. This paper proposes a feasible machine learning solution to the problem of small sample size in the bio-polymerization process. To avoid overfitting, the variational auto-encoder and generative adversarial network algorithms are used for data augmentation. The random forest and artificial neural network algorithms are implemented in the modeling process. The results prove that data augmentation techniques effectively improve the performance of the regression model. Several machine learning models were compared and the experimental results show that the random forest model with data augmentation by the generative adversarial network technique achieved the best performance in predicting the molecular weight on the training set (with an R of 0.94) and on the test set (with an R of 0.74), and the coefficient of determination of this model was 0.74.

摘要

机器学习在生物化学中的应用越来越广泛。然而,在有机化学和其他基于实验的领域,从实际实验中收集的数据并不充足,而当前的冠状病毒病(COVID-19)大流行使这种情况更加恶化。如此有限的数据资源可能导致建模性能低下,并影响控制策略的合理发展。本文针对生物聚合过程中样本量小的问题提出了一种可行的机器学习解决方案。为避免过拟合,采用变分自编码器和生成对抗网络算法进行数据增强。在建模过程中实现了随机森林和人工神经网络算法。结果证明,数据增强技术有效地提高了回归模型的性能。比较了几种机器学习模型,实验结果表明,采用生成对抗网络技术进行数据增强的随机森林模型在训练集(R为0.94)和测试集(R为0.74)上预测分子量时表现最佳,该模型的决定系数为0.74。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/b89ca420c787/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/1a344882818f/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/6eaaba344810/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/266c54048e0a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/bca184986f91/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/0e12d2bcf0ff/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/c6c9d5634e56/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/beed00d7c965/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/e28b833207dc/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/c509016847b3/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/6af0927f82c5/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/ab478ede8865/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/fc69bc637c2f/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/b89ca420c787/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/1a344882818f/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/6eaaba344810/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/266c54048e0a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/bca184986f91/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/0e12d2bcf0ff/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/c6c9d5634e56/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/beed00d7c965/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/e28b833207dc/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/c509016847b3/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/6af0927f82c5/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/ab478ede8865/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/fc69bc637c2f/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641e/9488058/b89ca420c787/gr12.jpg

相似文献

1
Data augmentation and machine learning techniques for control strategy development in bio-polymerization process.用于生物聚合过程控制策略开发的数据增强和机器学习技术。
Environ Sci Ecotechnol. 2022 Apr 20;11:100172. doi: 10.1016/j.ese.2022.100172. eCollection 2022 Jul.
2
Improving mortality prediction in Acute Pancreatitis by machine learning and data augmentation.通过机器学习和数据增强提高急性胰腺炎的死亡率预测。
Comput Biol Med. 2022 Nov;150:106077. doi: 10.1016/j.compbiomed.2022.106077. Epub 2022 Sep 11.
3
Generative chemistry: drug discovery with deep learning generative models.生成化学:用深度学习生成模型进行药物发现。
J Mol Model. 2021 Feb 4;27(3):71. doi: 10.1007/s00894-021-04674-8.
4
Machine Learning Data Augmentation Strategy for Electron Energy Loss Spectroscopy: Generative Adversarial Networks.用于电子能量损失谱的机器学习数据增强策略:生成对抗网络
Microsc Microanal. 2024 Apr 29;30(2):278-293. doi: 10.1093/mam/ozae014.
5
Data augmentation for enhancing EEG-based emotion recognition with deep generative models.基于深度生成模型的数据增强以增强基于 EEG 的情绪识别。
J Neural Eng. 2020 Oct 14;17(5):056021. doi: 10.1088/1741-2552/abb580.
6
Improving Image-Based Plant Disease Classification With Generative Adversarial Network Under Limited Training Set.在有限训练集下利用生成对抗网络改进基于图像的植物病害分类
Front Plant Sci. 2020 Dec 4;11:583438. doi: 10.3389/fpls.2020.583438. eCollection 2020.
7
Crash data augmentation using variational autoencoder.基于变分自编码器的碰撞数据增强。
Accid Anal Prev. 2021 Mar;151:105950. doi: 10.1016/j.aap.2020.105950. Epub 2020 Dec 25.
8
druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico.druGAN:一种高级生成对抗自动编码器模型,可在计算机上从头生成具有所需分子特性的新分子。
Mol Pharm. 2017 Sep 5;14(9):3098-3104. doi: 10.1021/acs.molpharmaceut.7b00346. Epub 2017 Aug 4.
9
A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.多中心随机森林模型在协作临床研究网络中的有效预后预测。
Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.
10
Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network.结合随机森林与生成对抗插补网络的缺失数据插补方法
Sensors (Basel). 2024 Feb 8;24(4):1112. doi: 10.3390/s24041112.

引用本文的文献

1
A review of machine learning methods for imbalanced data challenges in chemistry.化学中不平衡数据挑战的机器学习方法综述。
Chem Sci. 2025 Apr 22;16(18):7637-7658. doi: 10.1039/d5sc00270b. eCollection 2025 May 7.
2
Machine Learning-Based Process Optimization in Biopolymer Manufacturing: A Review.基于机器学习的生物聚合物制造过程优化:综述
Polymers (Basel). 2024 Nov 29;16(23):3368. doi: 10.3390/polym16233368.
3
Augmented machine learning for sewage quality assessment with limited data.用于有限数据污水质量评估的增强机器学习

本文引用的文献

1
Accurate colorectal tumor segmentation for CT scans based on the label assignment generative adversarial network.基于标签分配生成对抗网络的 CT 扫描中准确的结直肠肿瘤分割。
Med Phys. 2019 Aug;46(8):3532-3542. doi: 10.1002/mp.13584. Epub 2019 Jun 25.
2
Machine learning: Trends, perspectives, and prospects.机器学习:趋势、观点和展望。
Science. 2015 Jul 17;349(6245):255-60. doi: 10.1126/science.aaa8415.
3
The effect of the linker on the hydrolysis rate of drug-linked ester bonds.连接基对药物连接酯键水解速率的影响。
Environ Sci Ecotechnol. 2024 Nov 17;23:100512. doi: 10.1016/j.ese.2024.100512. eCollection 2025 Jan.
4
Identification of agricultural surface source pollution in plain river network areas based on 3D-EEMs and convolutional neural networks.基于 3D-EEMs 和卷积神经网络的平原河网区农业面源污染识别。
Water Sci Technol. 2024 Apr;89(8):1961-1980. doi: 10.2166/wst.2024.122. Epub 2024 Apr 15.
5
Machine Learning Methods for Small Data Challenges in Molecular Science.机器学习方法在分子科学中小数据挑战中的应用。
Chem Rev. 2023 Jul 12;123(13):8736-8780. doi: 10.1021/acs.chemrev.3c00189. Epub 2023 Jun 29.
6
Data Augmentation for a Virtual-Sensor-Based Nitrogen and Phosphorus Monitoring.基于虚拟传感器的氮磷监测数据增强。
Sensors (Basel). 2023 Jan 17;23(3):1061. doi: 10.3390/s23031061.
7
Predicting of Daily PM Concentration Employing Wavelet Artificial Neural Networks Based on Meteorological Elements in Shanghai, China.基于气象要素的小波人工神经网络预测中国上海每日细颗粒物浓度
Toxics. 2023 Jan 3;11(1):51. doi: 10.3390/toxics11010051.
8
Enhanced Soft Sensor with Qualified Augmented Samples for Quality Prediction of the Polyethylene Process.用于聚乙烯过程质量预测的具有合格增强样本的增强型软传感器
Polymers (Basel). 2022 Nov 7;14(21):4769. doi: 10.3390/polym14214769.
J Control Release. 2004 Mar 5;95(2):291-300. doi: 10.1016/j.jconrel.2003.12.009.
4
Recent developments in ring opening polymerization of lactones for biomedical applications.用于生物医学应用的内酯开环聚合的最新进展。
Biomacromolecules. 2003 Nov-Dec;4(6):1466-86. doi: 10.1021/bm034247a.