PI1M:高分子信息学基准数据库。

PI1M: A Benchmark Database for Polymer Informatics.

机构信息

Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States.

Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States.

出版信息

J Chem Inf Model. 2020 Oct 26;60(10):4684-4690. doi: 10.1021/acs.jcim.0c00726. Epub 2020 Oct 8.

Abstract

Open-source data on large scale are the cornerstones for data-driven research, but they are not readily available for polymers. In this work, we build a benchmark database, called PI1M (referring to ∼1 million polymers for polymer informatics), to provide data resources that can be used for machine learning research in polymer informatics. A generative model is trained on ∼12 000 polymers manually collected from the largest existing polymer database PolyInfo, and then the model is used to generate ∼1 million polymers. A new representation for polymers, polymer embedding (PE), is introduced, which is then used to perform several polymer informatics regression tasks for density, glass transition temperature, melting temperature, and dielectric constants. By comparing the PE trained by the PolyInfo data and that by the PI1M data, we conclude that the PI1M database covers similar chemical space as PolyInfo, but significantly populate regions where PolyInfo data are sparse. We believe that PI1M will serve as a good benchmark database for future research in polymer informatics.

摘要

开源的大规模数据是数据驱动研究的基石,但它们不适用于聚合物。在这项工作中,我们构建了一个基准数据库,称为 PI1M(表示约 100 万种聚合物的聚合物信息学),为聚合物信息学中的机器学习研究提供了可用于数据资源。我们在最大的现有聚合物数据库 PolyInfo 中手动收集的约 12000 种聚合物上训练了一个生成模型,然后使用该模型生成了约 100 万种聚合物。引入了一种新的聚合物表示形式,聚合物嵌入(PE),然后使用它来执行密度、玻璃化转变温度、熔点和介电常数的几个聚合物信息学回归任务。通过比较由 PolyInfo 数据训练的 PE 和由 PI1M 数据训练的 PE,我们得出结论,PI1M 数据库涵盖了与 PolyInfo 相似的化学空间,但在 PolyInfo 数据稀疏的区域有明显的填充。我们相信 PI1M 将成为未来聚合物信息学研究的一个很好的基准数据库。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索