Institute for Genomics and Bioinformatics and Department of Computer Science, University of California, Irvine, CA, USA.
J Chem Inf Model. 2012 Oct 22;52(10):2526-40. doi: 10.1021/ci3003039. Epub 2012 Oct 1.
Proposing reasonable mechanisms and predicting the course of chemical reactions is important to the practice of organic chemistry. Approaches to reaction prediction have historically used obfuscating representations and manually encoded patterns or rules. Here we present ReactionPredictor, a machine learning approach to reaction prediction that models elementary, mechanistic reactions as interactions between approximate molecular orbitals (MOs). A training data set of productive reactions known to occur at reasonable rates and yields and verified by inclusion in the literature or textbooks is derived from an existing rule-based system and expanded upon with manual curation from graduate level textbooks. Using this training data set of complex polar, hypervalent, radical, and pericyclic reactions, a two-stage machine learning prediction framework is trained and validated. In the first stage, filtering models trained at the level of individual MOs are used to reduce the space of possible reactions to consider. In the second stage, ranking models over the filtered space of possible reactions are used to order the reactions such that the productive reactions are the top ranked. The resulting model, ReactionPredictor, perfectly ranks polar reactions 78.1% of the time and recovers all productive reactions 95.7% of the time when allowing for small numbers of errors. Pericyclic and radical reactions are perfectly ranked 85.8% and 77.0% of the time, respectively, rising to >93% recovery for both reaction types with a small number of allowed errors. Decisions about which of the polar, pericyclic, or radical reaction type ranking models to use can be made with >99% accuracy. Finally, for multistep reaction pathways, we implement the first mechanistic pathway predictor using constrained tree-search to discover a set of reasonable mechanistic steps from given reactants to given products. Webserver implementations of both the single step and pathway versions of ReactionPredictor are available via the chemoinformatics portal http://cdb.ics.uci.edu/.
提出合理的反应机制并预测化学反应的过程对于有机化学的实践非常重要。历史上,反应预测方法使用了混淆的表示和手动编码的模式或规则。在这里,我们提出了 ReactionPredictor,这是一种用于反应预测的机器学习方法,它将基本的、机械的反应建模为近似分子轨道(MO)之间的相互作用。从现有的基于规则的系统中得出并通过包含在文献或教科书中进行验证的、已知以合理的速率和产率发生的生产性反应的训练数据集,并且通过研究生水平的教科书进行了手动编辑。使用这个复杂的极性、高价、自由基和周环反应的训练数据集,训练和验证了一个两阶段的机器学习预测框架。在第一阶段,使用在单个 MO 水平上训练的过滤模型来缩小要考虑的可能反应的空间。在第二阶段,使用过滤后的可能反应空间中的排序模型对反应进行排序,以使生产性反应排在首位。由此产生的模型,ReactionPredictor,在允许少量错误的情况下,完美地将极性反应排名 78.1%,并且完美地恢复了所有生产性反应 95.7%。周环和自由基反应的完美排名分别为 85.8%和 77.0%,对于这两种反应类型,允许少量错误时,恢复率均超过 93%。可以以>99%的准确率决定使用哪种极性、周环或自由基反应类型排名模型。最后,对于多步反应途径,我们使用受约束的树搜索实现了第一个机械途径预测器,以从给定的反应物中发现一组合理的机械步骤到给定的产物。ReactionPredictor 的单步和途径版本的网络服务器实现可通过化学信息学门户 http://cdb.ics.uci.edu/ 获得。