Suppr超能文献

定量解释解释了机器学习模型在化学反应预测中的应用,并揭示了其中的偏差。

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias.

机构信息

Cavendish Laboratory, University of Cambridge, Cambridge, UK.

出版信息

Nat Commun. 2021 Mar 16;12(1):1695. doi: 10.1038/s41467-021-21895-w.

Abstract

Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.

摘要

有机合成仍然是药物发现中的一个主要挑战。尽管文献中提出了大量的机器学习模型作为解决方案,但它们存在不透明的黑盒问题。目前尚不清楚模型是否做出了正确的预测,因为它们推断出了显著的化学性质,也不清楚它们依赖哪些训练数据来做出预测。这种不透明性既阻碍了模型开发人员,也阻碍了用户。在本文中,我们对分子转换器(用于反应预测的最先进模型)进行了定量解释。我们开发了一个框架,可以将预测的反应结果归因于反应物的特定部分,以及训练集中的反应。此外,我们通过仔细研究数据,展示了如何检索预测反应结果的证据,并理解反直觉的预测。此外,我们还发现了一些由于数据集偏差而导致错误原因的聪明汉斯预测。我们提出了一个新的无偏差数据集,为模型性能提供了更现实的评估,我们建议将其作为比较反应预测模型的新标准基准。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验