使用张量分解进行快速的多药副作用预测。

Fast polypharmacy side effect prediction using tensor factorization.

作者信息

Lloyd Oliver, Liu Yi, Gaunt Tom R

机构信息

MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, United Kingdom.

出版信息

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae706.

DOI:10.1093/bioinformatics/btae706

PMID:39582251

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11646082/

Abstract

MOTIVATION

Adverse reactions from drug combinations are increasingly common, making their accurate prediction a crucial challenge in modern medicine. Laboratory-based identification of these reactions is insufficient due to the combinatorial nature of the problem. While many computational approaches have been proposed, tensor factorization (TF) models have shown mixed results, necessitating a thorough investigation of their capabilities when properly optimized.

RESULTS

We demonstrate that TF models can achieve state-of-the-art performance on polypharmacy side effect prediction, with our best model (SimplE) achieving median scores of 0.978 area under receiver-operating characteristic curve, 0.971 area under precision-recall curve, and 1.000 AP@50 across 963 side effects. Notably, this model reaches 98.3% of its maximum performance after just two epochs of training (approximately 4 min), making it substantially faster than existing approaches while maintaining comparable accuracy. We also find that incorporating monopharmacy data as self-looping edges in the graph performs marginally better than using it to initialize embeddings.

AVAILABILITY AND IMPLEMENTATION

All code used in the experiments is available in our GitHub repository (https://doi.org/10.5281/zenodo.10684402). The implementation was carried out using Python 3.8.12 with PyTorch 1.7.1, accelerated with CUDA 11.4 on NVIDIA GeForce RTX 2080 Ti GPUs.

摘要

动机

药物组合的不良反应越来越常见，准确预测这些反应是现代医学中的一项关键挑战。由于该问题的组合性质，基于实验室的这些反应识别并不充分。虽然已经提出了许多计算方法，但张量分解（TF）模型的结果参差不齐，因此在进行适当优化时需要对其能力进行全面研究。

结果

我们证明，TF模型在多药副作用预测方面可以实现最先进的性能，我们最好的模型（SimplE）在963种副作用上的受试者操作特征曲线下面积中位数为0.978，精确率-召回率曲线下面积为0.971，AP@50为1.000。值得注意的是，该模型在仅经过两个训练轮次（约4分钟）后就达到了其最大性能的98.3%，这使得它在保持可比准确性的同时，比现有方法快得多。我们还发现，将单药数据作为图中的自循环边纳入，比将其用于初始化嵌入的效果略好。

可用性和实现

实验中使用的所有代码都可在我们的GitHub存储库（https://doi.org/10.5281/zenodo.10684402）中获得。该实现使用Python 3.8.12和PyTorch 1.7.1进行，并在NVIDIA GeForce RTX 2080 Ti GPU上使用CUDA 11.4进行加速。