PI1M：高分子信息学基准数据库。

PI1M: A Benchmark Database for Polymer Informatics.

机构信息

Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States.

Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States.

出版信息

J Chem Inf Model. 2020 Oct 26;60(10):4684-4690. doi: 10.1021/acs.jcim.0c00726. Epub 2020 Oct 8.

DOI:10.1021/acs.jcim.0c00726

PMID:32986418

Abstract

Open-source data on large scale are the cornerstones for data-driven research, but they are not readily available for polymers. In this work, we build a benchmark database, called PI1M (referring to ∼1 million polymers for polymer informatics), to provide data resources that can be used for machine learning research in polymer informatics. A generative model is trained on ∼12 000 polymers manually collected from the largest existing polymer database PolyInfo, and then the model is used to generate ∼1 million polymers. A new representation for polymers, polymer embedding (PE), is introduced, which is then used to perform several polymer informatics regression tasks for density, glass transition temperature, melting temperature, and dielectric constants. By comparing the PE trained by the PolyInfo data and that by the PI1M data, we conclude that the PI1M database covers similar chemical space as PolyInfo, but significantly populate regions where PolyInfo data are sparse. We believe that PI1M will serve as a good benchmark database for future research in polymer informatics.

摘要

开源的大规模数据是数据驱动研究的基石，但它们不适用于聚合物。在这项工作中，我们构建了一个基准数据库，称为 PI1M（表示约 100 万种聚合物的聚合物信息学），为聚合物信息学中的机器学习研究提供了可用于数据资源。我们在最大的现有聚合物数据库 PolyInfo 中手动收集的约 12000 种聚合物上训练了一个生成模型，然后使用该模型生成了约 100 万种聚合物。引入了一种新的聚合物表示形式，聚合物嵌入（PE），然后使用它来执行密度、玻璃化转变温度、熔点和介电常数的几个聚合物信息学回归任务。通过比较由 PolyInfo 数据训练的 PE 和由 PI1M 数据训练的 PE，我们得出结论，PI1M 数据库涵盖了与 PolyInfo 相似的化学空间，但在 PolyInfo 数据稀疏的区域有明显的填充。我们相信 PI1M 将成为未来聚合物信息学研究的一个很好的基准数据库。

相似文献

PI1M: A Benchmark Database for Polymer Informatics.

J Chem Inf Model. 2020 Oct 26;60(10):4684-4690. doi: 10.1021/acs.jcim.0c00726. Epub 2020 Oct 8.

Large-Scale Glass-Transition Temperature Prediction with an Equivariant Neural Network for Screening Polymers.

ACS Omega. 2024 Jan 26;9(5):5452-5462. doi: 10.1021/acsomega.3c06843. eCollection 2024 Feb 6.

Evaluating Polymer Representations via Quantifying Structure-Property Relationships.

J Chem Inf Model. 2019 Jul 22;59(7):3110-3119. doi: 10.1021/acs.jcim.9b00358. Epub 2019 Jul 3.

Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature.

J Chem Inf Model. 2021 Nov 22;61(11):5395-5413. doi: 10.1021/acs.jcim.1c01031. Epub 2021 Oct 18.

Machine learning discovery of high-temperature polymers.

Patterns (N Y). 2021 Mar 26;2(4):100225. doi: 10.1016/j.patter.2021.100225. eCollection 2021 Apr 9.

Dielectric Polymers Tolerant to Electric Field and Temperature Extremes: Integration of Phenomenology, Informatics, and Experimental Validation.

ACS Appl Mater Interfaces. 2021 Nov 17;13(45):53416-53424. doi: 10.1021/acsami.1c11885. Epub 2021 Aug 26.

Predicting Polymers' Glass Transition Temperature by a Chemical Language Processing Model.

Polymers (Basel). 2021 Jun 7;13(11):1898. doi: 10.3390/polym13111898.

Machine-Learning-Based Predictive Modeling of Glass Transition Temperatures: A Case of Polyhydroxyalkanoate Homopolymers and Copolymers.

J Chem Inf Model. 2019 Dec 23;59(12):5013-5025. doi: 10.1021/acs.jcim.9b00807. Epub 2019 Nov 22.

High-Temperature Polymer Dielectrics Designed Using an Invertible Molecular Graph Generative Model.

J Chem Inf Model. 2023 Dec 25;63(24):7669-7675. doi: 10.1021/acs.jcim.3c01572. Epub 2023 Dec 7.

Multi-Cover Persistence (MCP)-based machine learning for polymer property prediction.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae465.

引用本文的文献

Machine learning-driven generation and screening of potential ionic liquids for cellulose dissolution.

J Cheminform. 2025 May 21;17(1):78. doi: 10.1186/s13321-025-01018-z.

Local reaction condition optimization via machine learning.

J Mol Model. 2025 Apr 23;31(5):143. doi: 10.1007/s00894-025-06365-0.

Functional monomer design for synthetically accessible polymers.

Chem Sci. 2025 Feb 13;16(11):4755-4767. doi: 10.1039/d4sc08617a. eCollection 2025 Mar 12.

Machine Learning in Polymer Research.

Adv Mater. 2025 Mar;37(11):e2413695. doi: 10.1002/adma.202413695. Epub 2025 Feb 9.

Inverse design of copolymers including stoichiometry and chain architecture.

Chem Sci. 2024 Dec 17;16(3):1161-1178. doi: 10.1039/d4sc05900j. eCollection 2025 Jan 15.

Data science-centric design, discovery, and evaluation of novel synthetically accessible polyimides with desired dielectric constants.

Chem Sci. 2024 Oct 4;15(43):18099-110. doi: 10.1039/d4sc05000b.

Machine learning-guided strategies for reaction conditions design and optimization.

Beilstein J Org Chem. 2024 Oct 4;20:2476-2492. doi: 10.3762/bjoc.20.212. eCollection 2024.

AI-assisted discovery of high-temperature dielectrics for energy storage.

Nat Commun. 2024 Jul 19;15(1):6107. doi: 10.1038/s41467-024-50413-x.

Machine-Guided Discovery of Acrylate Photopolymer Compositions.

ACS Appl Mater Interfaces. 2024 Apr 10;16(14):17992-18000. doi: 10.1021/acsami.4c00759. Epub 2024 Mar 27.

Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance.

ACS Polym Au. 2024 Jan 10;4(1):66-76. doi: 10.1021/acspolymersau.3c00029. eCollection 2024 Feb 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PI1M：高分子信息学基准数据库。

PI1M: A Benchmark Database for Polymer Informatics.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献