Department of Computer Science and Engineering, Korea University, Seoul 02841, South Korea.
Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul 02841, South Korea.
Bioinformatics. 2019 Dec 15;35(24):5249-5256. doi: 10.1093/bioinformatics/btz411.
Traditional drug discovery approaches identify a target for a disease and find a compound that binds to the target. In this approach, structures of compounds are considered as the most important features because it is assumed that similar structures will bind to the same target. Therefore, structural analogs of the drugs that bind to the target are selected as drug candidates. However, even though compounds are not structural analogs, they may achieve the desired response. A new drug discovery method based on drug response, which can complement the structure-based methods, is needed.
We implemented Siamese neural networks called ReSimNet that take as input two chemical compounds and predicts the CMap score of the two compounds, which we use to measure the transcriptional response similarity of the two compounds. ReSimNet learns the embedding vector of a chemical compound in a transcriptional response space. ReSimNet is trained to minimize the difference between the cosine similarity of the embedding vectors of the two compounds and the CMap score of the two compounds. ReSimNet can find pairs of compounds that are similar in response even though they may have dissimilar structures. In our quantitative evaluation, ReSimNet outperformed the baseline machine learning models. The ReSimNet ensemble model achieves a Pearson correlation of 0.518 and a precision@1% of 0.989. In addition, in the qualitative analysis, we tested ReSimNet on the ZINC15 database and showed that ReSimNet successfully identifies chemical compounds that are relevant to a prototype drug whose mechanism of action is known.
The source code and the pre-trained weights of ReSimNet are available at https://github.com/dmis-lab/ReSimNet.
Supplementary data are available at Bioinformatics online.
传统的药物发现方法确定疾病的靶点,并找到与该靶点结合的化合物。在这种方法中,化合物的结构被认为是最重要的特征,因为假设具有相似结构的化合物将与同一靶点结合。因此,选择与靶点结合的药物的结构类似物作为候选药物。然而,即使化合物不是结构类似物,它们也可能产生所需的反应。需要一种基于药物反应的新的药物发现方法来补充基于结构的方法。
我们实现了一种名为 ReSimNet 的孪生神经网络,它将两个化学化合物作为输入,并预测这两个化合物的 CMap 得分,我们用这个得分来衡量这两个化合物的转录反应相似性。ReSimNet 学习化合物在转录反应空间中的嵌入向量。ReSimNet 的训练目标是最小化两个化合物的嵌入向量之间的余弦相似度与两个化合物的 CMap 得分之间的差异。ReSimNet 可以找到在反应中相似的化合物对,即使它们的结构可能不同。在我们的定量评估中,ReSimNet 优于基线机器学习模型。ReSimNet 集成模型的 Pearson 相关系数为 0.518,精度@1%为 0.989。此外,在定性分析中,我们在 ZINC15 数据库上测试了 ReSimNet,并表明 ReSimNet 成功识别出了与已知作用机制的原型药物相关的化学化合物。
ReSimNet 的源代码和预训练权重可在 https://github.com/dmis-lab/ReSimNet 上获得。
补充数据可在生物信息学在线获得。