Voršilák Milan, Kolář Michal, Čmelo Ivan, Svozil Daniel
CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague 6, Czech Republic.
CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of theCzech Academy of Sciences, Vídeňská 1083, 142 20, Prague 4, Czech Republic.
J Cheminform. 2020 May 20;12(1):35. doi: 10.1186/s13321-020-00439-2.
SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to - 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.
SYBA(合成贝叶斯可及性)是一种基于片段的方法,用于将有机化合物快速分类为易于合成(ES)或难以合成(HS)。它基于一个伯努利朴素贝叶斯分类器,该分类器用于根据各个片段在ES和HS分子数据库中的出现频率为其分配SYBA分数贡献。SYBA是在ZINC15数据库中可用的ES分子以及通过Nonpher方法生成的HS分子上进行训练的。将SYBA与用作基线方法的随机森林以及其他两种合成可及性评估方法:SAScore和SCScore进行了比较。当使用其建议的阈值时,SYBA在随机森林分类上有所改进,尽管幅度很小,并且优于SAScore和SCScore。然而,在优化SAScore阈值(从6.0变为 - 4.5)后,SAScore产生的结果与SYBA相似。由于SYBA仅基于片段贡献,因此可用于分析单个分子部分对化合物合成可及性的贡献。SYBA可在https://github.com/lich-uct/syba上根据GNU通用公共许可证公开获取。