Liang Fan, Feng Xiao-jiang, Lowry Michael, Rabitz Herschel
Department of Chemistry, Princeton University, Princeton, NJ 08544, USA.
J Phys Chem B. 2005 Mar 31;109(12):5842-54. doi: 10.1021/jp045926y.
This paper describes an adaptive algorithm for interpolation over a library of molecules subjected to synthesis and property assaying. Starting with a coarse sampling of the library compounds, the algorithm finds the optimal substituent orderings on all of the functionalized scaffold sites to allow for accurate property interpolation over all remaining compounds in the full library space. A previous paper introduced the concept of substituent reordering and a smoothness-based criterion to search for optimal orderings (Shenvi, N.; Geremia, J. M.; Rabitz, H. J. Phys. Chem. A 2003, 107, 2066). Here, we propose a data-driven root-mean-squared (RMS) criteria and a combined RMS/smoothness criteria as alternative methods for the discovery of optimal substituent orderings. Error propagation from the property measurements of the sampled compounds is determined to provide confidence intervals on the interpolated molecular property values, and a substituent rescaling technique is introduced to manage poorly designed/sampled libraries. Finally, various factors are explored that can influence the applicability and interpolation quality of the algorithm. An adaptive methodology is proposed to iteratively and efficiently use laboratory experiments to optimize these algorithmic factors, so that the accuracy of property predictions is maximized. The enhanced algorithm is tested on copolymer and transition metal complex libraries, and the results demonstrate the capability of the algorithm to accurately interpolate various properties of both molecular libraries.
本文描述了一种用于在经过合成和性质测定的分子库上进行插值的自适应算法。从分子库化合物的粗采样开始,该算法找到所有官能化支架位点上的最佳取代基排序,以便在整个分子库空间中的所有剩余化合物上进行准确的性质插值。之前的一篇论文介绍了取代基重新排序的概念以及一种基于平滑度的准则来搜索最佳排序(Shenvi, N.; Geremia, J. M.; Rabitz, H. J. Phys. Chem. A 2003, 107, 2066)。在这里,我们提出了一种数据驱动的均方根(RMS)准则和一种组合的RMS/平滑度准则,作为发现最佳取代基排序的替代方法。确定采样化合物性质测量中的误差传播,以提供插值分子性质值的置信区间,并引入一种取代基重新缩放技术来处理设计不佳/采样不足的分子库。最后,探讨了各种可能影响该算法适用性和插值质量的因素。提出了一种自适应方法,以迭代和有效地利用实验室实验来优化这些算法因素,从而使性质预测的准确性最大化。在共聚物和过渡金属配合物分子库上对增强后的算法进行了测试,结果证明了该算法能够准确插值这两种分子库的各种性质。