Suppr超能文献

用于机器学习预测化学反应和分子性质的集成结构和物理化学(SPOC)描述符。

An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties.

机构信息

Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, 100084, Beijing, China.

Haihe Laboratory of Sustainable Chemical Transformations, 300192, Tianjin, China.

出版信息

Chemphyschem. 2022 Jul 19;23(14):e202200255. doi: 10.1002/cphc.202200255. Epub 2022 May 19.

Abstract

Feature representations, or descriptors, are machines' chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional prediction tasks in the context of sparsely distributed and small datasets. Inspired by the chemist's vision on molecules, we presented herein an ensemble descriptor, SPOC, curated on the principles of physical organic chemistry that integrates Structure and Physicochemical property (SPOC) of a molecule. SPOC could be readily constructed by combining molecular fingerprints, representing the structure of a given molecule, and molecular physicochemical properties extracted from RDKit or Mordred molecular descriptors. The applicability of SPOC was fully surveyed in a range of well-structured chemical databases with machine learning tasks varying from regression to classifications.

摘要

特征表示或描述符是机器的化学语言,在很大程度上决定了机器学习模型的预测能力、泛化能力和可解释性。开发一种通用的描述符对于化学家来说是非常必要的,因为他们需要在数据稀疏和数据集较小的情况下处理传统的预测任务。受化学家对分子的看法的启发,我们在这里提出了一种集成描述符 SPOC,它是基于物理有机化学的原理构建的,该原理集成了分子的结构和物理化学性质(SPOC)。SPOC 可以通过组合分子指纹来轻松构建,分子指纹代表给定分子的结构,而分子物理化学性质则可以从 RDKit 或 Mordred 分子描述符中提取。我们在一系列结构良好的化学数据库中全面调查了 SPOC 的适用性,这些数据库中的机器学习任务从回归到分类不等。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验