Huang Yongdi, Chen Qionghai, Zhang Zhiyu, Gao Ke, Hu Anwen, Dong Yining, Liu Jun, Cui Lihong
College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing 100029, China.
State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China.
Polymers (Basel). 2022 May 6;14(9):1897. doi: 10.3390/polym14091897.
Natural rubber (NR), with its excellent mechanical properties, has been attracting considerable scientific and technological attention. Through molecular dynamics (MD) simulations, the effects of key structural factors on tensile stress at the molecular level can be examined. However, this high-precision method is computationally inefficient and time-consuming, which limits its application. The combination of machine learning and MD is one of the most promising directions to speed up simulations and ensure the accuracy of results. In this work, a surrogate machine learning method trained with MD data is developed to predict not only the tensile stress of NR but also other mechanical behaviors. We propose a novel idea based on feature processing by combining our previous experience in performing predictions of small samples. The proposed ML method consists of (i) an extreme gradient boosting (XGB) model to predict the tensile stress of NR, and (ii) a data augmentation algorithm based on nearest-neighbor interpolation (NNI) and the synthetic minority oversampling technique (SMOTE) to maximize the use of limited training data. Among the data enhancement algorithms that we design, the NNI algorithm finally achieves the effect of approaching the original data sample distribution by interpolating at the neighborhood of the original sample, and the SMOTE algorithm is used to solve the problem of sample imbalance by interpolating at the clustering boundaries of minority samples. The augmented samples are used to establish the XGB prediction model. Finally, the robustness of the proposed models and their predictive ability are guaranteed by high performance values, which indicate that the obtained regression models have good internal and external predictive capacities.
天然橡胶(NR)凭借其优异的机械性能,一直备受科技界的关注。通过分子动力学(MD)模拟,可以在分子水平上研究关键结构因素对拉伸应力的影响。然而,这种高精度方法计算效率低且耗时,限制了其应用。机器学习与MD相结合是加速模拟并确保结果准确性的最有前途的方向之一。在这项工作中,开发了一种用MD数据训练的替代机器学习方法,不仅可以预测NR的拉伸应力,还可以预测其他机械行为。我们结合之前进行小样本预测的经验,提出了一种基于特征处理的新思路。所提出的机器学习方法包括:(i)一个用于预测NR拉伸应力的极端梯度提升(XGB)模型,以及(ii)一种基于最近邻插值(NNI)和合成少数过采样技术(SMOTE)的数据增强算法,以最大限度地利用有限的训练数据。在我们设计的数据增强算法中,NNI算法最终通过在原始样本的邻域进行插值实现了接近原始数据样本分布的效果,而SMOTE算法则通过在少数样本的聚类边界进行插值来解决样本不平衡问题。增强后的样本用于建立XGB预测模型。最后,通过高性能值保证了所提出模型的稳健性及其预测能力,这表明所获得的回归模型具有良好的内部和外部预测能力。