Suppr超能文献

基于改进的支持向量数据描述在不平衡数据中预测药物-蛋白质相互作用

Predicting drug protein interactions based on improved support vector data description in unbalanced data.

作者信息

Khorramfard Alireza, Pirgazi Jamshid, Ghanbari Sorkhi Ali

机构信息

Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran.

出版信息

Bioimpacts. 2024 Dec 30;15:30468. doi: 10.34172/bi.30468. eCollection 2025.

Abstract

INTRODUCTION

Predicting drug-protein interactions is critical in drug discovery, but traditional laboratory methods are expensive and time-consuming. Computational approaches, especially those leveraging machine learning, are increasingly popular. This paper introduces VASVDD, a multi-step method to predict drug-protein interactions. First, it extracts features from amino acid sequences in proteins and drug structures. To address the challenge of unbalanced datasets, a Support Vector Data Description (SVDD) approach is employed, outperforming standard techniques like SMOTE and ENN in balancing data. Subsequently, dimensionality reduction using a Variational Autoencoder (VAE) reduces features from 1074 to 32, improving computational efficiency and predictive performance.

METHODS

The proposed method was evaluated on four datasets related to enzymes, G-protein-coupled receptors, ion channels, and nuclear receptors. Without preprocessing, the Gradient Boosting Classifier showed bias towards the majority class. However, balancing and dimensionality reduction significantly improved accuracy, sensitivity, specificity, and F1 scores. VASVDD demonstrated superior performance compared to other dimensionality reduction methods, such as kernel principal component analysis (kernel PCA) and Principal Component Analysis (PCA), and was validated across multiple classifiers, achieving higher AUROC values than existing techniques.

RESULTS

The results highlight VASVDD's effectiveness and generalizability in predicting drug-target interactions. The method outperforms state-of-the-art techniques in terms of accuracy, robustness, and efficiency, making it a promising tool in bioinformatics for drug discovery.

CONCLUSION

The datasets analyzed during the current study are not publicly available but are available from the corresponding author upon reasonable request and source code are available on GitHub: https://github.com/alirezakhorramfard/vasvdd.

摘要

引言

预测药物与蛋白质的相互作用在药物研发中至关重要,但传统实验室方法昂贵且耗时。计算方法,尤其是那些利用机器学习的方法,越来越受欢迎。本文介绍了VASVDD,一种预测药物与蛋白质相互作用的多步骤方法。首先,它从蛋白质中的氨基酸序列和药物结构中提取特征。为应对不平衡数据集的挑战,采用了支持向量数据描述(SVDD)方法,在平衡数据方面优于SMOTE和ENN等标准技术。随后,使用变分自编码器(VAE)进行降维,将特征从1074个减少到32个,提高了计算效率和预测性能。

方法

在与酶、G蛋白偶联受体、离子通道和核受体相关的四个数据集上对所提出的方法进行了评估。在没有预处理的情况下,梯度提升分类器对多数类表现出偏差。然而,平衡和降维显著提高了准确率、灵敏度、特异性和F1分数。与其他降维方法(如核主成分分析(kernel PCA)和主成分分析(PCA))相比,VASVDD表现出卓越的性能,并在多个分类器上得到验证,实现了比现有技术更高的受试者工作特征曲线下面积(AUROC)值。

结果

结果突出了VASVDD在预测药物-靶点相互作用方面的有效性和通用性。该方法在准确性、稳健性和效率方面优于现有技术,使其成为生物信息学中药物研发的一个有前途的工具。

结论

本研究期间分析的数据集未公开提供,但可应合理请求从相应作者处获取,源代码可在GitHub上获取:https://github.com/alirezakhorramfard/vasvdd

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b6/12008248/f3d1eee11351/bi-15-30468-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验