图神经网络在癌细胞系的表型虚拟筛选方面很有前景。

Graph neural networks are promising for phenotypic virtual screening on cancer cell lines.

作者信息

Vishwakarma Sachin, Hernandez-Hernandez Saiveth, Ballester Pedro J

机构信息

Evotec SAS (France), Toulouse, France.

Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France.

出版信息

Biol Methods Protoc. 2024 Sep 3;9(1):bpae065. doi: 10.1093/biomethods/bpae065. eCollection 2024.

DOI:10.1093/biomethods/bpae065

PMID:39502795

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11537795/

Abstract

Artificial intelligence is increasingly driving early drug design, offering novel approaches to virtual screening. Phenotypic virtual screening (PVS) aims to predict how cancer cell lines respond to different compounds by focusing on observable characteristics rather than specific molecular targets. Some studies have suggested that deep learning may not be the best approach for PVS. However, these studies are limited by the small number of tested molecules as well as not employing suitable performance metrics and dissimilar-molecules splits better mimicking the challenging chemical diversity of real-world screening libraries. Here we prepared 60 datasets, each containing approximately 30 000-50 000 molecules tested for their growth inhibitory activities on one of the NCI-60 cancer cell lines. We conducted multiple performance evaluations of each of the five machine learning algorithms for PVS on these 60 problem instances. To provide even a more comprehensive evaluation, we used two model validation types: the random split and the dissimilar-molecules split. Overall, about 14 440 training runs aczross datasets were carried out per algorithm. The models were primarily evaluated using hit rate, a more suitable metric in VS contexts. The results show that all models are more challenged by test molecules that are substantially different from those in the training data. In both validation types, the D-MPNN algorithm, a graph-based deep neural network, was found to be the most suitable for building predictive models for this PVS problem.

摘要

人工智能在早期药物设计中发挥着越来越重要的作用，为虚拟筛选提供了新方法。表型虚拟筛选（PVS）旨在通过关注可观察到的特征而非特定分子靶点来预测癌细胞系对不同化合物的反应。一些研究表明，深度学习可能不是PVS的最佳方法。然而，这些研究受到测试分子数量少以及未采用合适的性能指标和更好模拟现实世界筛选文库具有挑战性的化学多样性的不同分子划分的限制。在这里，我们准备了60个数据集，每个数据集包含大约30000-50000个针对NCI-60癌细胞系之一测试其生长抑制活性的分子。我们对这60个问题实例上用于PVS的五种机器学习算法中的每一种进行了多次性能评估。为了提供更全面的评估，我们使用了两种模型验证类型：随机划分和不同分子划分。总体而言，每种算法在各个数据集上大约进行了14440次训练运行。这些模型主要使用命中率进行评估，命中率在虚拟筛选环境中是更合适的指标。结果表明，所有模型都受到与训练数据中分子差异很大的测试分子的更大挑战。在两种验证类型中，基于图的深度神经网络D-MPNN算法被发现最适合为这个PVS问题构建预测模型。