Beatson Drug Discovery Unit, Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Bearsden, Glasgow, G61 1BD, U.K.
BioAscent Discovery Ltd., Bo'Ness Road, Newhouse, Lanarkshire ML1 5UH, U.K.
J Chem Inf Model. 2021 Jun 28;61(6):2547-2559. doi: 10.1021/acs.jcim.0c01226. Epub 2021 May 24.
Fragment-based hit identification (FBHI) allows proportionately greater coverage of chemical space using fewer molecules than traditional high-throughput screening approaches. However, effectively exploiting this advantage is highly dependent on the library design. Solubility, stability, chemical complexity, chemical/shape diversity, and synthetic tractability for fragment elaboration are all critical aspects, and molecule design remains a time-consuming task for computational and medicinal chemists. Artificial neural networks have attracted considerable attention in automated design applications and could also prove useful for fragment library design. Chemical autoencoders are neural networks consisting of encoder and decoder parts, which respectively compress and decompress molecular representations. The decoder is applied to samples drawn from the space of compressed representations to generate novel molecules that can be scored for properties of interest. Here, we report an autoencoder model using a recurrent neural network architecture, which was trained using 486,565 fragments curated from commercial sources, to simultaneously reconstruct both SMILES and chemical fingerprints. To explore its utility in fragment design, we applied transfer learning to the fingerprint decoder layers to train a classifier using 66 frequent hitter fragments identified from our screening campaigns. Using a particle swarm optimization sampling approach, we compare the performance of this "dual" model to an architecture encoding SMILES only. The dual model produced valid SMILES with improved features, considering a range of properties including aromatic ring counts, heavy atom count, synthetic accessibility, and a new fragment complexity score we term Feature Complexity (FeCo). Additionally, we demonstrate that generative performance is further enhanced by use of a simple syntax-correction procedure during training, in which invalid and undesirable SMILES are spiked into the training set. Finally, we used the syntax-corrected model to generate a library of novel candidate privileged fragments.
基于片段的命中鉴定 (FBHI) 允许使用比传统高通量筛选方法更少的分子来实现化学空间的比例更大的覆盖。然而,有效地利用这一优势高度依赖于文库设计。溶解度、稳定性、化学复杂性、化学/形状多样性以及片段修饰的合成可操作性都是关键方面,分子设计仍然是计算化学家和药物化学家耗时的任务。人工神经网络在自动化设计应用中引起了相当大的关注,也可能对片段文库设计有用。化学自动编码器是由编码器和解码器部分组成的神经网络,分别对分子表示进行压缩和解压缩。解码器应用于从压缩表示空间中抽取的样本,生成可用于计算感兴趣性质的新分子。在这里,我们报告了一种使用递归神经网络架构的自动编码器模型,该模型使用从商业来源中提取的 486,565 个片段进行训练,以同时重建 SMILES 和化学指纹。为了探索其在片段设计中的应用,我们将迁移学习应用于指纹解码器层,使用从我们的筛选活动中识别的 66 个高频命中片段来训练分类器。使用粒子群优化抽样方法,我们将这个“双”模型的性能与仅编码 SMILES 的架构进行了比较。该双模型生成的 SMILES 具有改进的特征,考虑了一系列特性,包括芳环计数、重原子计数、合成可及性和我们称之为特征复杂性 (FeCo) 的新片段复杂性评分。此外,我们证明通过在训练过程中使用简单的语法校正程序,可以进一步提高生成性能,其中将无效和不期望的 SMILES 混入训练集。最后,我们使用语法校正模型生成了一系列新的候选特权片段库。