Suppr超能文献

基于数据增强的机器学习打分函数在 YTHDF1 mA 读蛋白虚拟筛选中的应用。

Data-augmented machine learning scoring functions for virtual screening of YTHDF1 mA reader protein.

机构信息

Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, China; College of Physics and Optoelectronics Engineering, Shenzhen University, Shenzhen, 518060, China.

Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, China.

出版信息

Comput Biol Med. 2024 Dec;183:109268. doi: 10.1016/j.compbiomed.2024.109268. Epub 2024 Oct 12.

Abstract

Machine learning is rapidly advancing the drug discovery process, significantly enhancing speed and efficiency. Innovation in computer-aided drug design is primarily driven by structure- and ligand-based approaches. When the number of known inhibitors for a target is limited, data augmentation strategies are often preferred to enhance model performance. In this study, we developed predictive machine learning models for structure-based drug discovery leveraging multiple traditional machine learning algorithms trained with target and ligand dynamics-aware datasets. To illustrate our approach, we present a composite model that combines classification and regression to predict YTHDF1 inhibitors, utilizing PLEC features. YTHDF1, a key mA reader protein involved in mRNA translation, is implicated in various cancers, making it a promising therapeutic target. Traditional structure-based virtual screening (SBVS) using generic scoring functions has struggled to identify potent YTHDF1 inhibitors due to the protein's unique binding characteristics. To overcome this, we developed YTHDF1-specific machine learning scoring functions (MLSFs) to enhance SBVS efficacy. We employed various data augmentation techniques to generate a comprehensive dataset, incorporating multiple conformations of ligands and the YTHDF1 protein. We have trained 64 YTHDF1-specific MLSFs using four machine learning algorithms and evaluated them on ten test sets, focusing on their predictive and ranking power. Our results demonstrate that the artificial neural network with protein-ligand extended connectivity fingerprints (ANN-PLEC) outperforms other MLSFs, consistently achieving high area under the precision-recall curve (PR-AUC) of 0.87. This method shows promise for targets with limited quantities of active molecules, providing a viable path forward for drug discovery research. The ANN-PLEC scoring function is made freely available on GitHub for other researchers to access and utilize https://github.com/JuniML/SBVS-YTHDF1/.

摘要

机器学习正在快速推进药物发现进程,显著提高了速度和效率。计算机辅助药物设计的创新主要由基于结构和配体的方法驱动。当目标的已知抑制剂数量有限时,通常倾向于使用数据增强策略来提高模型性能。在这项研究中,我们利用多个经过目标和配体动力学感知数据集训练的传统机器学习算法,开发了基于结构的药物发现的预测性机器学习模型。为了说明我们的方法,我们提出了一种组合模型,该模型结合分类和回归来预测 YTHDF1 抑制剂,利用 PLEC 特征。YTHDF1 是一种参与 mRNA 翻译的关键 mA 读取蛋白,与各种癌症有关,使其成为有前途的治疗靶点。由于蛋白质独特的结合特性,传统的基于结构的虚拟筛选 (SBVS) 使用通用评分函数难以识别有效的 YTHDF1 抑制剂。为了克服这一问题,我们开发了特定于 YTHDF1 的机器学习评分函数 (MLSFs) 来增强 SBVS 的效果。我们采用了各种数据增强技术来生成一个综合数据集,其中包含配体和 YTHDF1 蛋白的多种构象。我们使用四种机器学习算法训练了 64 个 YTHDF1 特异性 MLSFs,并在十个测试集中对它们进行了评估,重点评估它们的预测和排序能力。我们的结果表明,具有蛋白质-配体扩展连接指纹的人工神经网络 (ANN-PLEC) 优于其他 MLSFs,始终保持高的精度-召回曲线下面积 (PR-AUC) 为 0.87。该方法有望用于活性分子数量有限的靶标,为药物发现研究提供了可行的途径。ANN-PLEC 评分函数在 GitHub 上免费提供,供其他研究人员访问和使用 https://github.com/JuniML/SBVS-YTHDF1/。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验