Suppr超能文献

利用变分自动编码器和卷积神经网络解决配体结合位点预测中的数据不平衡问题。

Addressing data imbalance problems in ligand-binding site prediction using a variational autoencoder and a convolutional neural network.

机构信息

Computer science department of Yuan Ze University, Taiwan.

Department of Information Management, Yuan Ze University, Taiwan.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab277.

Abstract

Since 2015, a fast growing number of deep learning-based methods have been proposed for protein-ligand binding site prediction and many have achieved promising performance. These methods, however, neglect the imbalanced nature of binding site prediction problems. Traditional data-based approaches for handling data imbalance employ linear interpolation of minority class samples. Such approaches may not be fully exploited by deep neural networks on downstream tasks. We present a novel technique for balancing input classes by developing a deep neural network-based variational autoencoder (VAE) that aims to learn important attributes of the minority classes concerning nonlinear combinations. After learning, the trained VAE was used to generate new minority class samples that were later added to the original data to create a balanced dataset. Finally, a convolutional neural network was used for classification, for which we assumed that the nonlinearity could be fully integrated. As a case study, we applied our method to the identification of FAD- and FMN-binding sites of electron transport proteins. Compared with the best classifiers that use traditional machine learning algorithms, our models obtained a great improvement on sensitivity while maintaining similar or higher levels of accuracy and specificity. We also demonstrate that our method is better than other data imbalance handling techniques, such as SMOTE, ADASYN, and class weight adjustment. Additionally, our models also outperform existing predictors in predicting the same binding types. Our method is general and can be applied to other data types for prediction problems with moderate-to-heavy data imbalances.

摘要

自 2015 年以来,已经提出了许多基于深度学习的方法来进行蛋白质配体结合位点预测,其中许多方法都取得了很有前景的性能。然而,这些方法忽略了结合位点预测问题的不平衡本质。传统的数据处理不平衡的方法采用少数类样本的线性插值。这种方法可能不能被深度神经网络在下游任务中充分利用。我们提出了一种新的技术来平衡输入类,通过开发一个基于深度神经网络的变分自动编码器(VAE),该方法旨在学习少数类的重要属性,这些属性与非线性组合有关。学习后,训练好的 VAE 用于生成新的少数类样本,这些样本后来被添加到原始数据中,以创建一个平衡数据集。最后,我们使用卷积神经网络进行分类,我们假设非线性可以完全集成。作为一个案例研究,我们将我们的方法应用于电子传递蛋白的 FAD 和 FMN 结合位点的识别。与使用传统机器学习算法的最佳分类器相比,我们的模型在保持类似或更高的准确性和特异性水平的同时,大大提高了灵敏度。我们还证明了我们的方法优于其他数据不平衡处理技术,如 SMOTE、ADASYN 和类权重调整。此外,我们的模型在预测相同的结合类型时也优于现有的预测器。我们的方法是通用的,可以应用于其他数据类型,用于具有中等至严重数据不平衡的预测问题。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验