Suppr超能文献

复杂的机器学习模型需要复杂的测试:通过图神经网络检验分子结合亲和力的可预测性。

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network.

作者信息

Nikolaienko Tymofii, Gurbych Oleksandr, Druchok Maksym

机构信息

SoftServe, Inc., Lviv, Ukraine.

Faculty of Physics, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.

出版信息

J Comput Chem. 2022 Apr 15;43(10):728-739. doi: 10.1002/jcc.26831. Epub 2022 Feb 24.

Abstract

Drug discovery pipelines typically involve high-throughput screening of large amounts of compounds in a search of potential drugs candidates. As a chemical space of small organic molecules is huge, a "navigation" over it urges for fast and lightweight computational methods, thus promoting machine-learning approaches for processing huge pools of candidates. In this contribution, we present a graph-based deep neural network for prediction of protein-drug binding affinity and assess its predictive power under thorough testing conditions. Within the suggested approach, both protein and drug molecules are represented as graphs and passed to separate graph sub-networks, then concatenated and regressed towards a binding affinity. The neural network is trained on two binding affinity datasets-PDBbind and data imported from RCSB Protein Data Bank. In order to explore the generalization capabilities of the model we go beyond traditional random or leave-cluster-out techniques and demonstrate the need for more elaborate model performance assessment - six different strategies for test/train data partitioning (random, time- and property-arranged, protein- and ligand-clustered) with a k-fold cross-validation are engaged. Finally, we discuss the model performance in terms of a set of metrics for different split strategies and fold arrangement. Our code is available at https://github.com/SoftServeInc/affinity-by-GNN.

摘要

药物发现流程通常涉及对大量化合物进行高通量筛选,以寻找潜在的药物候选物。由于小有机分子的化学空间巨大,在该空间内进行“导航”迫切需要快速且轻量级的计算方法,从而推动了用于处理大量候选物的机器学习方法的发展。在本论文中,我们提出了一种基于图的深度神经网络,用于预测蛋白质 - 药物结合亲和力,并在全面的测试条件下评估其预测能力。在所提出的方法中,蛋白质和药物分子均表示为图,并传递到单独的图子网中,然后连接起来并对结合亲和力进行回归分析。该神经网络在两个结合亲和力数据集——PDBbind和从RCSB蛋白质数据库导入的数据上进行训练。为了探索模型的泛化能力,我们超越了传统的随机或留簇法技术,并证明需要更精细的模型性能评估——采用k折交叉验证的六种不同的测试/训练数据划分策略(随机、按时间和性质排列、按蛋白质和配体聚类)。最后,我们根据针对不同划分策略和折安排的一组指标来讨论模型性能。我们的代码可在https://github.com/SoftServeInc/affinity-by-GNN获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验