Suppr超能文献

使用长短期记忆神经网络的深度学习进行无描述符定量构效关系建模

Descriptor Free QSAR Modeling Using Deep Learning With Long Short-Term Memory Neural Networks.

作者信息

Chakravarti Suman K, Alla Sai Radha Mani

机构信息

MultiCASE Inc., Beachwood, OH, United States.

出版信息

Front Artif Intell. 2019 Sep 6;2:17. doi: 10.3389/frai.2019.00017. eCollection 2019.

Abstract

Current practice of building QSAR models usually involves computing a set of descriptors for the training set compounds, applying a descriptor selection algorithm and finally using a statistical fitting method to build the model. In this study, we explored the prospects of building good quality interpretable QSARs for big and diverse datasets, without using any pre-calculated descriptors. We have used different forms of Long Short-Term Memory (LSTM) neural networks to achieve this, trained directly using either traditional SMILES codes or a new linear molecular notation developed as part of this work. Three endpoints were modeled: Ames mutagenicity, inhibition of Dd2 and inhibition of Hepatitis C Virus, with training sets ranging from 7,866 to 31,919 compounds. To boost the interpretability of the prediction results, attention-based machine learning mechanism, jointly with a bidirectional LSTM was used to detect structural alerts for the mutagenicity data set. Traditional fragment descriptor-based models were used for comparison. As per the results of the external and cross-validation experiments, overall prediction accuracies of the LSTM models were close to the fragment-based models. However, LSTM models were superior in predicting test chemicals that are dissimilar to the training set compounds, a coveted quality of QSAR models in real world applications. In summary, it is possible to build QSAR models using LSTMs without using pre-computed traditional descriptors, and models are far from being "black box." We wish that this study will be helpful in bringing large, descriptor-less QSARs to mainstream use.

摘要

当前构建定量构效关系(QSAR)模型的做法通常包括为训练集化合物计算一组描述符,应用描述符选择算法,最后使用统计拟合方法构建模型。在本研究中,我们探索了在不使用任何预先计算的描述符的情况下,为大型多样数据集构建高质量可解释QSAR的前景。我们使用了不同形式的长短期记忆(LSTM)神经网络来实现这一目标,直接使用传统的SMILES编码或作为本工作一部分开发的新的线性分子表示法进行训练。对三个终点进行了建模:艾姆斯致突变性、对Dd2的抑制作用和对丙型肝炎病毒的抑制作用,训练集包含7866至31919种化合物。为了提高预测结果的可解释性,基于注意力的机器学习机制与双向LSTM联合使用,以检测致突变性数据集的结构警报。使用基于传统片段描述符的模型进行比较。根据外部和交叉验证实验的结果,LSTM模型的总体预测准确率与基于片段的模型相近。然而,LSTM模型在预测与训练集化合物不同的测试化学品方面表现更优,这是QSAR模型在实际应用中令人向往的品质。总之,使用LSTM可以在不使用预先计算的传统描述符的情况下构建QSAR模型,并且这些模型远非“黑箱”。我们希望这项研究将有助于使大型、无描述符的QSAR模型得到主流应用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验