Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States.
Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States.
J Chem Inf Model. 2021 May 24;61(5):2159-2174. doi: 10.1021/acs.jcim.0c01355. Epub 2021 Apr 26.
In their previous work, Srinivas et al. [ 2018, 10, 56] have shown that implicit fingerprints capture ligands and proteins in a shared latent space, typically for the purposes of virtual screening with collaborative filtering models applied on known bioactivity data. In this work, we extend these implicit fingerprints/descriptors using deep learning techniques to translate latent descriptors into discrete representations of molecules (SMILES), without explicitly optimizing for chemical properties. This allows the design of new compounds based upon the latent representation of nearby proteins, thereby encoding druglike properties including binding affinities to known proteins. The implicit descriptor method does not require any fingerprint similarity search, which makes the method free of any bias arising from the empirical nature of the fingerprint models [Srinivas, R.; 2018, 10, 56]. We evaluate the properties of the potentially novel drugs generated by our approach using physical properties of druglike molecules and chemical complexity. Additionally, we analyze the reliability of the biological activity of the new compounds generated using this method by employing models of protein-ligand interaction, which assists in assessing the potential binding affinity of the designed compounds. We find that the generated compounds exhibit properties of chemically feasible compounds and are predicted to be excellent binders to known proteins. Furthermore, we also analyze the diversity of compounds created using the Tanimoto distance and conclude that there is a wide diversity in the generated compounds.
在之前的工作中,Srinivas 等人[2018, 10, 56]已经表明,隐式指纹可以在共享的潜在空间中捕获配体和蛋白质,通常用于基于协同过滤模型的虚拟筛选,这些模型应用于已知的生物活性数据。在这项工作中,我们使用深度学习技术扩展了这些隐式指纹/描述符,将潜在描述符转换为分子的离散表示(SMILES),而无需针对化学性质进行显式优化。这允许根据附近蛋白质的潜在表示来设计新的化合物,从而编码包括与已知蛋白质的结合亲和力在内的类药性。隐式描述符方法不需要任何指纹相似性搜索,这使得该方法不受指纹模型经验性质引起的任何偏差的影响[Srinivas, R.;2018, 10, 56]。我们使用类药性分子的物理性质和化学复杂度来评估我们方法生成的潜在新型药物的性质。此外,我们还通过使用蛋白质-配体相互作用模型来分析使用该方法生成的新化合物的生物活性的可靠性,这有助于评估设计化合物的潜在结合亲和力。我们发现生成的化合物表现出化学可行化合物的性质,并预测它们是已知蛋白质的优秀配体。此外,我们还分析了使用 Tanimoto 距离创建的化合物的多样性,并得出结论,生成的化合物具有广泛的多样性。