Polishchuk Pavel, Madzhidov Timur, Gimadiev Timur, Bodrov Andrey, Nugmanov Ramil, Varnek Alexandre
Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic.
A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Odessa, Ukraine.
J Comput Aided Mol Des. 2017 Sep;31(9):829-839. doi: 10.1007/s10822-017-0044-3. Epub 2017 Jul 27.
We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn't need an explicit labeling of a reaction center. The rigorous "product-out" cross-validation (CV) strategy has been suggested. Unlike the naïve "reaction-out" CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new "mixture" approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.
反应物混合物和产物混合物。反过来,每种混合物都可以使用先前报道的涉及单纯形描述符(SiRMS)的方法进行编码。表示这两种混合物的特征向量来自连接的产物和反应物描述符,或者产物和反应物描述符之间的差异。这种反应表示不需要对反应中心进行明确标记。已经提出了严格的“产物排除”交叉验证(CV)策略。与基于随机选择项目的朴素 “反应排除” CV 方法不同,所提出的方法能更真实地估计产生新产物的反应的预测准确性。这种新方法已应用于模拟 E2 反应的速率常数。结果表明,使用片段控制域适用性方法可显著提高模型的预测准确性。用新的 “混合物” 方法获得的模型比那些需要明确(反应凝聚图)或隐含(反应指纹)反应中心标记的模型表现更好。