Suppr超能文献

使用来自不同化学空间的大型体外 ADME 数据集对四种图神经网络进行准确性和泛化能力的基准测试。

Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces.

机构信息

Genentech, 1 DNA Way, South San Francisco, California, 94080, United States.

F. Hoffmann-La Roche Ltd., pRED, Pharma Research & Early Development, Roche Innovation Center Basel, Grenzacherstrasse 124, 4070, Basel, Switzerland.

出版信息

Mol Inform. 2022 Aug;41(8):e2100321. doi: 10.1002/minf.202100321. Epub 2022 Feb 23.

Abstract

In this work, we benchmark a variety of single- and multi-task graph neural network (GNN) models against lower-bar and higher-bar traditional machine learning approaches employing human engineered molecular features. We consider four GNN variants - Graph Convolutional Network (GCN), Graph Attention Network (GAT), Message Passing Neural Network (MPNN), and Attentive Fingerprint (AttentiveFP). So far deep learning models have been primarily benchmarked using lower-bar traditional models solely based on fingerprints, while more realistic benchmarks employing fingerprints, whole-molecule descriptors and predictions from other related endpoints (e. g., LogD7.4) appear to be scarce for industrial ADME datasets. In addition to time-split test sets based on Genentech data, this study benefits from the availability of measurements from an external chemical space (Roche data). We identify GAT as a promising approach to implementing deep learning models. While all the deep learning models significantly outperform lower-bar benchmark traditional models solely based on fingerprints, only GATs seem to offer a small but consistent improvement over higher-bar benchmark traditional models. Finally, the accuracy of in vitro assays from different laboratories predicting the same experimental endpoints appears to be comparable with the accuracy of GAT single-task models, suggesting that most of the observed error from the models is a function of the experimental error propagation.

摘要

在这项工作中,我们将各种单任务和多任务图神经网络 (GNN) 模型与采用人工设计分子特征的低门槛和高门槛传统机器学习方法进行基准测试。我们考虑了四种 GNN 变体 - 图卷积网络 (GCN)、图注意力网络 (GAT)、消息传递神经网络 (MPNN) 和注意力指纹 (AttentiveFP)。到目前为止,深度学习模型主要使用仅基于指纹的低门槛传统模型进行基准测试,而对于工业 ADME 数据集,使用指纹、全分子描述符和来自其他相关终点(例如 LogD7.4)的预测的更现实基准测试似乎很少见。除了基于 Genentech 数据的时间分割测试集外,本研究还受益于外部化学空间(罗氏数据)测量结果的可用性。我们确定 GAT 是实现深度学习模型的有前途的方法。虽然所有的深度学习模型都明显优于仅基于指纹的低门槛基准传统模型,但只有 GAT 似乎在高门槛基准传统模型上提供了微小但一致的改进。最后,来自不同实验室预测相同实验终点的体外测定的准确性似乎与 GAT 单任务模型的准确性相当,这表明模型中观察到的大部分误差是实验误差传播的函数。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验