Liu Qinrui, Broderick Scott R
Department of Materials Design and Innovation, University at Buffalo, Buffalo, NY 14260, USA.
Int J Mol Sci. 2025 Jul 30;26(15):7344. doi: 10.3390/ijms26157344.
The purpose of this paper is to use an informatics-based analysis to develop a rational design approach to the accelerated screening of nano-composite materials. Using existing nano-composite data, we develop a quantitative structure-activity relationship (QSAR) as a function of polymer matrix chemistry and nano-additive volume, with the property predicted being electrical conductivity. The development of a QSAR for the electrical conductivity of nano-composites presents challenges in representing the polymer matrix chemistry and backbone structure, the additive content, and the interactions between the components while capturing the non-linearity of electrical conductivity with changing nano-additive volume. An important aspect of this work is designing chemistries with small training data sizes, as the uncertainty in modeling is high, and potentially the representated physics may be minimal. In this work, we explore two important components of this aspect. First, an assessment via Uniform Manifold Approximation and Projection (UMAP) is used to assess the variability provided by new data points and how much information is contributed by data, which is significantly more important than the actual data size (i.e., how much new information is provided by each data point?). The second component involves assessing multiple training/testing splits to ensure that any results are not due to a specific case but rather that the results are statistically meaningful. This work will accelerate the rational design of polymer nano-composites by fully considering the large array of possible variables while providing a high-speed screening of polymer chemistries.
本文的目的是使用基于信息学的分析方法来开发一种合理的设计方法,以加速纳米复合材料的筛选。利用现有的纳米复合材料数据,我们建立了一种定量结构-活性关系(QSAR),它是聚合物基体化学性质和纳米添加剂体积的函数,预测的性质是电导率。开发纳米复合材料电导率的QSAR在表示聚合物基体化学性质和主链结构、添加剂含量以及各组分之间的相互作用方面存在挑战,同时还要捕捉电导率随纳米添加剂体积变化的非线性关系。这项工作的一个重要方面是设计训练数据量小的化学体系,因为建模的不确定性很高,而且可能所代表的物理原理很少。在这项工作中,我们探索了这方面的两个重要组成部分。首先,通过均匀流形近似和投影(UMAP)进行评估,以评估新数据点提供的变异性以及数据贡献了多少信息,这比实际数据量重要得多(即每个数据点提供了多少新信息?)。第二个组成部分涉及评估多个训练/测试划分,以确保任何结果不是由于特定情况导致的,而是结果具有统计学意义。这项工作将通过充分考虑大量可能的变量,同时提供聚合物化学体系的高速筛选,加速聚合物纳米复合材料的合理设计。