Suppr超能文献

用于均聚物大分子的自动化 BigSMILES 转换工作流程和数据集。

Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules.

机构信息

School of Electrical Engineering, Korea University, Seoul, South Korea.

Department of Materials Science and Engineering, Korea University, Seoul, South Korea.

出版信息

Sci Data. 2024 Apr 11;11(1):371. doi: 10.1038/s41597-024-03212-4.

Abstract

The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.

摘要

简化分子线性输入系统 (SMILES) 因其能够使用线式符号表示化学结构,因此在各种人工智能分析中得到了广泛应用。然而,其表示的简便性有限,这导致了 BigSMILES 的提出,作为一种适合表示大分子的替代方法。然而,由于其预处理要求,BigSMILES 的研究仍然有限。因此,本研究提出了一种 BigSMILES 的转换工作流程,重点是从均聚物的 SMILES 表示自动生成 BigSMILES。提供了 4,927,181 条记录的 BigSMILES 表示,从而可以立即将其用于各种研究和开发应用。我们的研究介绍了详细的验证过程描述,以确保转换的准确性、可互换性和稳健性。此外,还生成了对所使用的代码和功能的系统概述,强调了它们在 BigSMILES 生成背景下的相关性。这一进展预计将极大地帮助研究人员,并促进 BigSMILES 表示的进一步研究,包括在深度学习中的潜在应用以及对共聚物等复杂结构的进一步扩展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e05a/11009387/c48e9b0dab23/41597_2024_3212_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验