Suppr超能文献

具有自注意力机制的暹罗递归神经网络用于生物活性预测

Siamese Recurrent Neural Network with a Self-Attention Mechanism for Bioactivity Prediction.

作者信息

Fernández-Llaneza Daniel, Ulander Silas, Gogishvili Dea, Nittinger Eva, Zhao Hongtao, Tyrchan Christian

机构信息

Department of Medicinal Chemistry, Research and Early Development, Respiratory and Immunology, Biopharmaceutical R&D, AstraZeneca, Pepparedsleden 1, SE 43183 Mölndal, Sweden.

出版信息

ACS Omega. 2021 Apr 15;6(16):11086-11094. doi: 10.1021/acsomega.1c01266. eCollection 2021 Apr 27.

Abstract

Activity prediction plays an essential role in drug discovery by directing search of drug candidates in the relevant chemical space. Despite being applied successfully to image recognition and semantic similarity, the Siamese neural network has rarely been explored in drug discovery where modelling faces challenges such as insufficient data and class imbalance. Here, we present a Siamese recurrent neural network model (SiameseCHEM) based on bidirectional long short-term memory architecture with a self-attention mechanism, which can automatically learn discriminative features from the SMILES representations of small molecules. Subsequently, it is used to categorize bioactivity of small molecules via -shot learning. Trained on random SMILES strings, it proves robust across five different datasets for the task of binary or categorical classification of bioactivity. Benchmarking against two baseline machine learning models which use the chemistry-rich ECFP fingerprints as the input, the deep learning model outperforms on three datasets and achieves comparable performance on the other two. The failure of both baseline methods on SMILES strings highlights that the deep learning model may learn task-specific chemistry features encoded in SMILES strings.

摘要

活性预测通过在相关化学空间中指导药物候选物的搜索,在药物发现中起着至关重要的作用。尽管暹罗神经网络已成功应用于图像识别和语义相似性,但在药物发现中却很少被探索,因为药物发现中的建模面临数据不足和类别不平衡等挑战。在这里,我们提出了一种基于双向长短期记忆架构并带有自注意力机制的暹罗递归神经网络模型(SiameseCHEM),它可以从小分子的SMILES表示中自动学习判别特征。随后,它被用于通过少样本学习对小分子的生物活性进行分类。在随机SMILES字符串上进行训练后,它在五个不同数据集上对于生物活性的二元或分类任务都表现出鲁棒性。与两个使用富含化学信息的ECFP指纹作为输入的基线机器学习模型进行基准测试,该深度学习模型在三个数据集上表现更优,在另外两个数据集上取得了可比的性能。两种基线方法在SMILES字符串上的失败凸显了深度学习模型可能学习到了SMILES字符串中编码的特定任务化学特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab91/8153912/d99570428a82/ao1c01266_0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验