Mayorova Oksana A, Saveleva Mariia S, Bratashov Daniil N, Prikhozhdenko Ekaterina S
Science Medical Center, Saratov State University, 83 Astrakhanskaya Str., 410012 Saratov, Russia.
Polymers (Basel). 2024 Feb 29;16(5):666. doi: 10.3390/polym16050666.
Macromolecules and their complexes remain interesting topics in various fields, such as targeted drug delivery and tissue regeneration. The complex chemical structure of such substances can be studied with a combination of Raman spectroscopy and machine learning. The complex of whey protein isolate (WPI) and hyaluronic acid (HA) is beneficial in terms of drug delivery. It provides HA properties with the stability obtained from WPI. However, differences between WPI-HA and WPI solutions can be difficult to detect by Raman spectroscopy. Especially when the low HA (0.1, 0.25, 0.5% w/v) and the constant WPI (5% w/v) concentrations are used. Before applying the machine learning techniques, all the collected data were divided into training and test sets in a ratio of 3:1. The performances of two ensemble methods, random forest (RF) and gradient boosting (GB), were evaluated on the Raman data, depending on the type of problem (regression or classification). The impact of noise reduction using principal component analysis (PCA) on the performance of the two machine learning methods was assessed. This procedure allowed us to reduce the number of features while retaining 95% of the explained variance in the data. Another application of these machine learning methods was to identify the WPI Raman bands that changed the most with the addition of HA. Both the RF and GB could provide feature importance data that could be plotted in conjunction with the actual Raman spectra of the samples. The results show that the addition of HA to WPI led to changes mainly around 1003 cm (correspond to ring breath of phenylalanine) and 1400 cm, as demonstrated by the regression and classification models. For selected Raman bands, where the feature importance was greater than 1%, a direct evaluation of the effect of the amount of HA on the Raman intensities was performed but was found not to be informative. Thus, applying the RF or GB estimators to the Raman data with feature importance evaluation could detect and highlight small differences in the spectra of substances that arose from changes in the chemical structure; using PCA to filter out noise in the Raman data could improve the performance of both the RF and GB. The demonstrated results will make it possible to analyze changes in chemical bonds during various processes, for example, conjugation, to study complex mixtures of substances, even with small additions of the components of interest.
大分子及其复合物在诸如靶向药物递送和组织再生等各个领域仍然是有趣的研究课题。此类物质复杂的化学结构可以通过拉曼光谱和机器学习相结合的方式进行研究。乳清蛋白分离物(WPI)与透明质酸(HA)的复合物在药物递送方面具有益处。它赋予HA从WPI获得的稳定性。然而,通过拉曼光谱很难检测到WPI - HA与WPI溶液之间的差异。特别是当使用低浓度的HA(0.1、0.25、0.5% w/v)和恒定的WPI浓度(5% w/v)时。在应用机器学习技术之前,所有收集到的数据按照3:1的比例分为训练集和测试集。根据问题类型(回归或分类),在拉曼数据上评估了两种集成方法——随机森林(RF)和梯度提升(GB)的性能。评估了使用主成分分析(PCA)降噪对这两种机器学习方法性能的影响。此过程使我们能够减少特征数量,同时保留数据中95%的解释方差。这些机器学习方法的另一个应用是识别随着HA添加变化最大的WPI拉曼谱带。RF和GB都可以提供特征重要性数据,这些数据可以与样品的实际拉曼光谱一起绘制。结果表明,如回归和分类模型所示,向WPI中添加HA主要导致在1003 cm(对应苯丙氨酸的环呼吸)和1400 cm附近发生变化。对于特征重要性大于1%的选定拉曼谱带,对HA量对拉曼强度的影响进行了直接评估,但发现其信息量不足。因此,将RF或GB估计器应用于具有特征重要性评估的拉曼数据,可以检测并突出因化学结构变化而在物质光谱中产生的微小差异;使用PCA过滤拉曼数据中的噪声可以提高RF和GB的性能。所展示的结果将使得有可能分析各种过程(例如共轭)中化学键的变化,以研究物质的复杂混合物,即使添加的目标成分很少。