Graduate Program in Electrical Engineering - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil.
Department of Pharmaceutical Products - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil.
J Mol Graph Model. 2024 Jan;126:108627. doi: 10.1016/j.jmgm.2023.108627. Epub 2023 Sep 29.
This research investigates the application of Graph Neural Networks (GNNs) to enhance the cost-effectiveness of drug development, addressing the limitations of cost and time. Class imbalances within classification datasets, such as the discrepancy between active and inactive compounds, give rise to difficulties that can be resolved through strategies like oversampling, undersampling, and manipulation of the loss function. A comparison is conducted between three distinct datasets using three different GNN architectures. This benchmarking research can steer future investigations and enhance the efficacy of GNNs in drug discovery and design. Three hundred models for each combination of architecture and dataset were trained using hyperparameter tuning techniques and evaluated using a range of metrics. Notably, the oversampling technique outperforms eight experiments, showcasing its potential. While balancing techniques boost imbalanced dataset models, their efficacy depends on dataset specifics and problem type. Although oversampling aids molecular graph datasets, more research is needed to optimize its usage and explore other class imbalance solutions.
这项研究探讨了图神经网络(GNN)在提高药物开发成本效益方面的应用,解决了成本和时间的限制问题。分类数据集内的类不平衡问题,如活性和非活性化合物之间的差异,会带来一些困难,可以通过过采样、欠采样和损失函数调整等策略来解决。本研究对三个不同的数据集使用三种不同的 GNN 架构进行了比较。这项基准研究可以为未来的研究提供指导,并提高 GNN 在药物发现和设计中的功效。对每种架构和数据集组合使用超参数调整技术训练了 300 个模型,并使用多种指标进行了评估。值得注意的是,过采样技术在八项实验中表现优于其他技术,显示出其潜力。平衡技术虽然可以提高不平衡数据集模型的性能,但它们的效果取决于数据集的具体情况和问题类型。虽然过采样有助于分子图数据集,但仍需要进一步研究以优化其使用并探索其他类不平衡解决方案。