School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China.
Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China.
J Chem Inf Model. 2023 Feb 27;63(4):1177-1187. doi: 10.1021/acs.jcim.2c01389. Epub 2023 Jan 18.
Unique structure representation of polymers plays a crucial role in developing models for polymer property prediction and polymer design by data-centric approaches. Currently, monomer and repeating unit (RU) approximations are widely used to represent polymer structures for generating feature descriptors in the modeling of quantitative structure-property relationships (QSPR). However, such conventional structure representations may not uniquely approximate heterochain polymers due to the diversity of monomer combinations and the potential multi-RUs. In this study, the so-called ring repeating unit (RRU) method that can uniquely represent polymers with a broad range of structure diversity is proposed for the first time. As a proof of concept, an RRU-based QSPR model was developed to predict the associated glass transition temperature () of polyimides (PIs) with deterministic values. Comprehensive model validations including external, internal, and -random validations were performed. Also, an RU-based QSPR model developed based on the same large database of 1321 PIs provides nonunique prediction results, which further prove the necessity of RRU-based structure representation. Promising results obtained by the application of the RRU-based model confirm that the as-developed RRU method provides an effective representation that accurately captures the sequence of repeat units and thus realizes reliable polymer property prediction by data-driven approaches.
聚合物的独特结构表示在通过以数据为中心的方法进行聚合物性质预测和聚合物设计方面起着至关重要的作用。目前,单体和重复单元 (RU) 近似被广泛用于表示聚合物结构,以便在定量结构-性质关系 (QSPR) 的建模中生成特征描述符。然而,由于单体组合的多样性和潜在的多 RU,这种传统的结构表示方法可能无法唯一地近似杂链聚合物。在这项研究中,首次提出了所谓的环重复单元 (RRU) 方法,该方法可以唯一地表示具有广泛结构多样性的聚合物。作为概念验证,开发了基于 RRU 的 QSPR 模型来预测具有确定值的聚酰亚胺 (PI) 的相关玻璃化转变温度 (T g)。进行了包括外部、内部和-random 验证在内的全面模型验证。此外,还基于相同的 1321 个 PI 的大型数据库开发了基于 RU 的 QSPR 模型,该模型提供了非唯一的预测结果,这进一步证明了基于 RRU 的结构表示的必要性。通过应用基于 RRU 的模型获得的有希望的结果证实,所开发的 RRU 方法提供了一种有效的表示方法,可以准确地捕获重复单元的序列,从而通过数据驱动的方法实现可靠的聚合物性质预测。