Suppr超能文献

一种用于化学反应的新型指纹图谱的开发及其在大规模反应分类和相似性方面的应用。

Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity.

作者信息

Schneider Nadine, Lowe Daniel M, Sayle Roger A, Landrum Gregory A

机构信息

Novartis Institutes for BioMedical Research , Novartis Campus, 4002 Basel, Switzerland.

出版信息

J Chem Inf Model. 2015 Jan 26;55(1):39-53. doi: 10.1021/ci5006614. Epub 2015 Jan 13.

Abstract

Fingerprint methods applied to molecules have proven to be useful for similarity determination and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chemical reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calculated physicochemical properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also observed when applying the classifier to reactions from an in-house electronic laboratory notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster analysis that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the analysis are provided in the Supporting Information.

摘要

已证明应用于分子的指纹方法对于相似性测定以及作为机器学习模型的输入是有用的。在此,我们展示了一种用于化学反应的新指纹的开发,并验证了其在构建机器学习模型和相似性评估中的有用性。我们最终的指纹构建为产物和反应物的原子对指纹之差,并通过计算的物理化学性质纳入试剂。我们在一个从过去40年授予的美国专利中通过文本挖掘得到的大型反应数据集上验证了这些指纹,这些反应已使用基于子结构的专家系统进行了分类。我们应用机器学习构建了一个用于反应类型分类的50类预测模型,该模型在外部测试集中正确预测了97%的反应。当将该分类器应用于内部电子实验室笔记本中的反应时,也观察到了令人印象深刻的准确率。通过聚类分析评估了用于评估反应相似性的新型指纹的性能,该分析在50个反应类别中恢复了48个,聚类的中位数F分数为0.63。支持信息中提供了用于训练和初步验证的数据集以及重现该分析所需的所有Python脚本。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验