微调BERT模型以准确预测药物-靶点相互作用。

Fine-tuning of BERT Model to Accurately Predict Drug-Target Interactions.

作者信息

Kang Hyeunseok, Goo Sungwoo, Lee Hyunjung, Chae Jung-Woo, Yun Hwi-Yeol, Jung Sangkeun

机构信息

Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea.

College of Pharmacy, Chungnam National University, Daejeon 34134, Korea.

出版信息

Pharmaceutics. 2022 Aug 16;14(8):1710. doi: 10.3390/pharmaceutics14081710.

DOI:10.3390/pharmaceutics14081710

PMID:36015336

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9414546/

Abstract

The identification of optimal drug candidates is very important in drug discovery. Researchers in biology and computational sciences have sought to use machine learning (ML) to efficiently predict drug-target interactions (DTIs). In recent years, according to the emerging usefulness of pretrained models in natural language process (NLPs), pretrained models are being developed for chemical compounds and target proteins. This study sought to improve DTI predictive models using a Bidirectional Encoder Representations from the Transformers (BERT)-pretrained model, ChemBERTa, for chemical compounds. Pretraining features the use of a simplified molecular-input line-entry system (SMILES). We also employ the pretrained ProBERT for target proteins (pretraining employed the amino acid sequences). The BIOSNAP, DAVIS, and BindingDB databases (DBs) were used (alone or together) for learning. The final model, taught by both ChemBERTa and ProtBert and the integrated DBs, afforded the best DTI predictive performance to date based on the receiver operating characteristic area under the curve (AUC) and precision-recall-AUC values compared with previous models. The performance of the final model was verified using a specific case study on 13 pairs of subtrates and the metabolic enzyme cytochrome P450 (CYP). The final model afforded excellent DTI prediction. As the real-world interactions between drugs and target proteins are expected to exhibit specific patterns, pretraining with ChemBERTa and ProtBert could teach such patterns. Learning the patterns of such interactions would enhance DTI accuracy if learning employs large, well-balanced datasets that cover all relationships between drugs and target proteins.

摘要

在药物研发中，识别最佳候选药物非常重要。生物学和计算科学领域的研究人员一直在寻求利用机器学习（ML）来有效预测药物-靶点相互作用（DTIs）。近年来，鉴于预训练模型在自然语言处理（NLPs）中日益凸显的作用，针对化合物和靶蛋白的预训练模型也在不断开发。本研究旨在使用针对化合物的基于变换器的双向编码器表征（BERT）预训练模型ChemBERTa来改进DTI预测模型。预训练采用简化分子输入线性条目系统（SMILES）。我们还将预训练的ProBERT用于靶蛋白（预训练采用氨基酸序列）。使用BIOSNAP、DAVIS和BindingDB数据库（单独或组合使用）进行学习。由ChemBERTa和ProtBert以及整合后的数据库共同训练的最终模型，与之前的模型相比，基于曲线下面积（AUC）和精确召回率-AUC值，展现出了迄今为止最佳的DTI预测性能。通过对13对底物与代谢酶细胞色素P450（CYP）的具体案例研究，验证了最终模型的性能。最终模型给出了出色的DTI预测结果。由于药物与靶蛋白之间的实际相互作用预计会呈现特定模式，使用ChemBERTa和ProtBert进行预训练可以传授这些模式。如果学习采用涵盖药物与靶蛋白之间所有关系的大规模、均衡数据集，那么了解此类相互作用的模式将提高DTI预测的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83a3/9414546/5247bb5f55b1/pharmaceutics-14-01710-g004.jpg

相似文献

Fine-tuning of BERT Model to Accurately Predict Drug-Target Interactions.微调BERT模型以准确预测药物-靶点相互作用。

Pharmaceutics. 2022 Aug 16;14(8):1710. doi: 10.3390/pharmaceutics14081710.

Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining.推进药物-靶标相互作用预测：一种综合基于图的方法，整合知识图嵌入和 ProtBert 预训练。

BMC Bioinformatics. 2023 Dec 19;24(1):488. doi: 10.1186/s12859-023-05593-6.

Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT.在基于转换器的双向编码器表示预训练（BERT）中进行过采样，以定位医学 BERT 并增强生物医学 BERT。

Artif Intell Med. 2024 Jul;153:102889. doi: 10.1016/j.artmed.2024.102889. Epub 2024 May 5.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT：基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration.通过SMILES枚举增强的多任务学习BERT推动药物发现中分子性质预测的边界

Research (Wash D C). 2022 Dec 15;2022:0004. doi: 10.34133/research.0004. eCollection 2022.

Predicting Drug-Target Interactions with Deep-Embedding Learning of Graphs and Sequences.基于图和序列深度学习嵌入预测药物-靶标相互作用。

J Phys Chem A. 2021 Jul 1;125(25):5633-5642. doi: 10.1021/acs.jpca.1c02419. Epub 2021 Jun 18.

FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction.FG-BERT：一种用于性质预测的通用的、基于自监督的官能团分子表示学习框架。

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad398.

Depression Risk Prediction for Chinese Microblogs via Deep-Learning Methods: Content Analysis.基于深度学习方法的中文微博抑郁风险预测：内容分析

JMIR Med Inform. 2020 Jul 29;8(7):e17958. doi: 10.2196/17958.

A Robust Drug-Target Interaction Prediction Framework with Capsule Network and Transfer Learning.一种基于胶囊网络和迁移学习的稳健药物-靶点相互作用预测框架。

Int J Mol Sci. 2023 Sep 14;24(18):14061. doi: 10.3390/ijms241814061.

Using BERT to identify drug-target interactions from whole PubMed.使用 BERT 从整个 PubMed 中识别药物-靶标相互作用。

BMC Bioinformatics. 2022 Jun 21;23(1):245. doi: 10.1186/s12859-022-04768-x.

引用本文的文献

Top-DTI: integrating topological deep learning and large language models for drug-target interaction prediction.Top-DTI：整合拓扑深度学习和大语言模型用于药物-靶点相互作用预测

Bioinformatics. 2025 Jul 1;41(Supplement_1):i133-i141. doi: 10.1093/bioinformatics/btaf183.

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景：任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

Artificial Intelligence Models and Tools for the Assessment of Drug-Herb Interactions.用于评估药物与草药相互作用的人工智能模型和工具

Pharmaceuticals (Basel). 2025 Feb 20;18(3):282. doi: 10.3390/ph18030282.

SELFprot: Effective and Efficient Multitask Finetuning Methods for Protein Parameter Prediction.SELFprot：用于蛋白质参数预测的高效多任务微调方法

J Chem Inf Model. 2025 Apr 14;65(7):3226-3238. doi: 10.1021/acs.jcim.4c02230. Epub 2025 Mar 17.

Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction.Top-DTI：整合拓扑深度学习与大语言模型用于药物靶点相互作用预测

bioRxiv. 2025 Feb 8:2025.02.07.637146. doi: 10.1101/2025.02.07.637146.

Barlow Twins deep neural network for advanced 1D drug-target interaction prediction.用于高级一维药物-靶点相互作用预测的巴洛双胞胎深度神经网络。

J Cheminform. 2025 Feb 5;17(1):18. doi: 10.1186/s13321-025-00952-2.

Interpretable adenylation domain specificity prediction using protein language models.使用蛋白质语言模型进行可解释的腺苷化结构域特异性预测。

bioRxiv. 2025 Jan 18:2025.01.13.632878. doi: 10.1101/2025.01.13.632878.

Accurate and transferable drug-target interaction prediction with DrugLAMP.使用DrugLAMP进行准确且可转移的药物-靶点相互作用预测。

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae693.

DTI-LM: language model powered drug-target interaction prediction.DTI-LM：基于语言模型的药物-靶标相互作用预测。

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae533.

Attention-based approach to predict drug-target interactions across seven target superfamilies.基于注意力的方法预测跨越七个靶标超家族的药物-靶标相互作用。

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae496.

本文引用的文献

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans：通过自监督学习理解生命语言。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

MolTrans: Molecular Interaction Transformer for drug-target interaction prediction.MolTrans：用于药物-靶标相互作用预测的分子相互作用转换器。

Bioinformatics. 2021 May 5;37(6):830-836. doi: 10.1093/bioinformatics/btaa880.

DeepDTA: deep drug-target binding affinity prediction.深度 DTA：深度药物-靶标结合亲和力预测。

Bioinformatics. 2018 Sep 1;34(17):i821-i829. doi: 10.1093/bioinformatics/bty593.

Machine Learning for Drug-Target Interaction Prediction.机器学习在药物-靶标相互作用预测中的应用。

Molecules. 2018 Aug 31;23(9):2208. doi: 10.3390/molecules23092208.

Large-scale comparison of machine learning methods for drug target prediction on ChEMBL.基于ChEMBL的药物靶点预测机器学习方法的大规模比较

Chem Sci. 2018 Jun 6;9(24):5441-5451. doi: 10.1039/c8sc00148k. eCollection 2018 Jun 28.

Clustering huge protein sequence sets in linear time.线性时间内的大规模蛋白质序列集聚类。

Nat Commun. 2018 Jun 29;9(1):2542. doi: 10.1038/s41467-018-04964-5.

Deep-Learning-Based Drug-Target Interaction Prediction.基于深度学习的药物-靶点相互作用预测

J Proteome Res. 2017 Apr 7;16(4):1401-1409. doi: 10.1021/acs.jproteome.6b00618. Epub 2017 Mar 13.

KEGG: new perspectives on genomes, pathways, diseases and drugs.京都基因与基因组百科全书（KEGG）：关于基因组、通路、疾病和药物的新视角。

Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361. doi: 10.1093/nar/gkw1092. Epub 2016 Nov 28.

PubChem Substance and Compound databases.美国国立医学图书馆化学物质数据库和化合物数据库。

Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13. doi: 10.1093/nar/gkv951. Epub 2015 Sep 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

微调BERT模型以准确预测药物-靶点相互作用。

Fine-tuning of BERT Model to Accurately Predict Drug-Target Interactions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献