基于数据增强的机器学习打分函数在 YTHDF1 mA 读蛋白虚拟筛选中的应用。

Data-augmented machine learning scoring functions for virtual screening of YTHDF1 mA reader protein.

机构信息

Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, China; College of Physics and Optoelectronics Engineering, Shenzhen University, Shenzhen, 518060, China.

Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, China.

出版信息

Comput Biol Med. 2024 Dec;183:109268. doi: 10.1016/j.compbiomed.2024.109268. Epub 2024 Oct 12.

DOI:10.1016/j.compbiomed.2024.109268

PMID:39405731

Abstract

Machine learning is rapidly advancing the drug discovery process, significantly enhancing speed and efficiency. Innovation in computer-aided drug design is primarily driven by structure- and ligand-based approaches. When the number of known inhibitors for a target is limited, data augmentation strategies are often preferred to enhance model performance. In this study, we developed predictive machine learning models for structure-based drug discovery leveraging multiple traditional machine learning algorithms trained with target and ligand dynamics-aware datasets. To illustrate our approach, we present a composite model that combines classification and regression to predict YTHDF1 inhibitors, utilizing PLEC features. YTHDF1, a key mA reader protein involved in mRNA translation, is implicated in various cancers, making it a promising therapeutic target. Traditional structure-based virtual screening (SBVS) using generic scoring functions has struggled to identify potent YTHDF1 inhibitors due to the protein's unique binding characteristics. To overcome this, we developed YTHDF1-specific machine learning scoring functions (MLSFs) to enhance SBVS efficacy. We employed various data augmentation techniques to generate a comprehensive dataset, incorporating multiple conformations of ligands and the YTHDF1 protein. We have trained 64 YTHDF1-specific MLSFs using four machine learning algorithms and evaluated them on ten test sets, focusing on their predictive and ranking power. Our results demonstrate that the artificial neural network with protein-ligand extended connectivity fingerprints (ANN-PLEC) outperforms other MLSFs, consistently achieving high area under the precision-recall curve (PR-AUC) of 0.87. This method shows promise for targets with limited quantities of active molecules, providing a viable path forward for drug discovery research. The ANN-PLEC scoring function is made freely available on GitHub for other researchers to access and utilize https://github.com/JuniML/SBVS-YTHDF1/.

摘要

机器学习正在快速推进药物发现进程，显著提高了速度和效率。计算机辅助药物设计的创新主要由基于结构和配体的方法驱动。当目标的已知抑制剂数量有限时，通常倾向于使用数据增强策略来提高模型性能。在这项研究中，我们利用多个经过目标和配体动力学感知数据集训练的传统机器学习算法，开发了基于结构的药物发现的预测性机器学习模型。为了说明我们的方法，我们提出了一种组合模型，该模型结合分类和回归来预测 YTHDF1 抑制剂，利用 PLEC 特征。YTHDF1 是一种参与 mRNA 翻译的关键 mA 读取蛋白，与各种癌症有关，使其成为有前途的治疗靶点。由于蛋白质独特的结合特性，传统的基于结构的虚拟筛选 (SBVS) 使用通用评分函数难以识别有效的 YTHDF1 抑制剂。为了克服这一问题，我们开发了特定于 YTHDF1 的机器学习评分函数 (MLSFs) 来增强 SBVS 的效果。我们采用了各种数据增强技术来生成一个综合数据集，其中包含配体和 YTHDF1 蛋白的多种构象。我们使用四种机器学习算法训练了 64 个 YTHDF1 特异性 MLSFs，并在十个测试集中对它们进行了评估，重点评估它们的预测和排序能力。我们的结果表明，具有蛋白质-配体扩展连接指纹的人工神经网络 (ANN-PLEC) 优于其他 MLSFs，始终保持高的精度-召回曲线下面积 (PR-AUC) 为 0.87。该方法有望用于活性分子数量有限的靶标，为药物发现研究提供了可行的途径。ANN-PLEC 评分函数在 GitHub 上免费提供，供其他研究人员访问和使用 https://github.com/JuniML/SBVS-YTHDF1/。

相似文献

Data-augmented machine learning scoring functions for virtual screening of YTHDF1 mA reader protein.基于数据增强的机器学习打分函数在 YTHDF1 mA 读蛋白虚拟筛选中的应用。

Comput Biol Med. 2024 Dec;183:109268. doi: 10.1016/j.compbiomed.2024.109268. Epub 2024 Oct 12.

Beware of the generic machine learning-based scoring functions in structure-based virtual screening.在基于结构的虚拟筛选中，要警惕基于通用机器学习的打分函数。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa070.

SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.SCORCH：利用机器学习分类器、数据增强和不确定性估计改进基于结构的虚拟筛选。

J Adv Res. 2023 Apr;46:135-147. doi: 10.1016/j.jare.2022.07.001. Epub 2022 Jul 25.

Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?准确性还是新颖性：在虚拟筛选中，基于目标的机器学习打分函数能为我们带来什么？

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.

Selecting machine-learning scoring functions for structure-based virtual screening.基于结构的虚拟筛选中机器学习打分函数的选择。

Drug Discov Today Technol. 2019 Dec;32-33:81-87. doi: 10.1016/j.ddtec.2020.09.001. Epub 2020 Sep 19.

Target-Specific Machine Learning Scoring Function Improved Structure-Based Virtual Screening Performance for SARS-CoV-2 Drugs Development.基于靶标特异性机器学习打分函数的结构虚拟筛选方法提高了 SARS-CoV-2 药物研发的效率。

Int J Mol Sci. 2022 Sep 20;23(19):11003. doi: 10.3390/ijms231911003.

ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions.ML-PLIC：一个用于描述蛋白质-配体相互作用和开发基于机器学习的打分函数的网络平台。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad295.

Artificial intelligence to deep learning: machine intelligence approach for drug discovery.人工智能到深度学习：药物发现的机器智能方法。

Mol Divers. 2021 Aug;25(3):1315-1360. doi: 10.1007/s11030-021-10217-3. Epub 2021 Apr 12.

A practical guide to machine-learning scoring for structure-based virtual screening.基于结构的虚拟筛选的机器学习评分实用指南。

Nat Protoc. 2023 Nov;18(11):3460-3511. doi: 10.1038/s41596-023-00885-w. Epub 2023 Oct 16.

Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors.综合机器学习助力基于结构的PARP1抑制剂虚拟筛选。

J Cheminform. 2024 Apr 7;16(1):40. doi: 10.1186/s13321-024-00832-1.

引用本文的文献

The future of pharmaceuticals: Artificial intelligence in drug discovery and development.制药的未来：药物研发中的人工智能

J Pharm Anal. 2025 Aug;15(8):101248. doi: 10.1016/j.jpha.2025.101248. Epub 2025 Feb 26.

Graph convolutional neural networks improved target-specific scoring functions for cGAS and kRAS in virtual screening.图卷积神经网络改进了用于虚拟筛选中cGAS和kRAS的靶点特异性评分函数。

Comput Struct Biotechnol J. 2025 May 23;27:2176-2185. doi: 10.1016/j.csbj.2025.05.023. eCollection 2025.

SPLIF-Enhanced Attention-Driven 3D CNNs for Precise and Reliable Protein-Ligand Interaction Modeling for METTL3.用于METTL3精确可靠蛋白质-配体相互作用建模的基于SPLIF增强注意力驱动的3D卷积神经网络

ACS Omega. 2025 Apr 16;10(16):16748-16761. doi: 10.1021/acsomega.5c00538. eCollection 2025 Apr 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于数据增强的机器学习打分函数在 YTHDF1 mA 读蛋白虚拟筛选中的应用。

Data-augmented machine learning scoring functions for virtual screening of YTHDF1 mA reader protein.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献