分析到分子：基于大语言模型并利用生物分析背景的药物设计

Assay2Mol: large language model-based drug design using BioAssay context.

作者信息

Deng Yifan, Ericksen Spencer S, Gitter Anthony

出版信息

ArXiv. 2025 Jul 16:arXiv:2507.12574v1.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12288650/

Abstract

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate the functional responses of candidate molecules against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for new drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that generate candidate ligand molecules for target protein structures, while also promoting more synthesizable molecule generation.

摘要

科学数据库汇总了大量定量数据以及描述性文本。在生物化学中，分子筛选测定评估候选分子针对疾病靶点的功能反应。描述这些靶点作用的生物学机制、实验筛选方案以及测定的其他属性的非结构化文本，为新药研发活动提供了丰富信息，但由于其非结构化格式而未被利用。我们提出了Assay2Mol，这是一种基于大语言模型的工作流程，可利用现有的大量生化筛选测定进行早期药物发现。Assay2Mol检索涉及与新靶点相似靶点的现有测定记录，并使用检索到的测定筛选数据通过上下文学习生成候选分子。Assay2Mol优于最近为目标蛋白质结构生成候选配体分子的机器学习方法，同时还促进了更多可合成分子的生成。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9791/12288650/c7f8b5f23dff/nihpp-2507.12574v1-f0001.jpg

相似文献

Assay2Mol: large language model-based drug design using BioAssay context.分析到分子：基于大语言模型并利用生物分析背景的药物设计

ArXiv. 2025 Jul 16:arXiv:2507.12574v1.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Short-Term Memory Impairment短期记忆障碍

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟：一、入组、临床、液体方案。

Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

PDF Entity Annotation Tool (PEAT).PDF实体注释工具（PEAT）。

J Open Source Softw. 2025 Apr 8;10(108):5336. doi: 10.21105/joss.05336.

Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。

Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.

本文引用的文献

Chemical Language Model Linker: Blending Text and Molecules with Modular Adapters.化学语言模型链接器：通过模块化适配器融合文本与分子

J Chem Inf Model. 2025 Sep 8;65(17):8944-8956. doi: 10.1021/acs.jcim.5c00853. Epub 2025 Aug 21.

MHNfs: Prompting In-Context Bioactivity Predictions for Low-Data Drug Discovery.MHNfs：为低数据药物发现提供上下文生物活性预测

J Chem Inf Model. 2025 May 12;65(9):4243-4250. doi: 10.1021/acs.jcim.4c02373. Epub 2025 Apr 30.

A review of large language models and autonomous agents in chemistry.化学领域中大型语言模型与自主智能体的综述。

Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.

Structure-based drug design with equivariant diffusion models.基于结构的药物设计与等变扩散模型

Nat Comput Sci. 2024 Dec;4(12):899-909. doi: 10.1038/s43588-024-00737-x. Epub 2024 Dec 9.

CACTUS: Chemistry Agent Connecting Tool Usage to Science.仙人掌：将化学试剂连接工具的使用与科学相结合。

ACS Omega. 2024 Oct 25;9(46):46563-46573. doi: 10.1021/acsomega.4c08408. eCollection 2024 Nov 19.

PubChem 2025 update.PubChem 2025更新版。

Nucleic Acids Res. 2025 Jan 6;53(D1):D1516-D1525. doi: 10.1093/nar/gkae1059.

TamGen: drug design with target-aware molecule generation through a chemical language model.TamGen：通过化学语言模型实现基于靶标感知的分子生成的药物设计。

Nat Commun. 2024 Oct 29;15(1):9360. doi: 10.1038/s41467-024-53632-4.

Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer.对PubChem生物测定记录进行数据挖掘，发现多种氧化磷酸化抑制化学类型可作为抗卵巢癌的潜在治疗药物。

J Cheminform. 2024 Oct 7;16(1):112. doi: 10.1186/s13321-024-00906-0.

Structure-Based Drug Design with a Deep Hierarchical Generative Model.基于结构的深度层次生成模型药物设计。

J Chem Inf Model. 2024 Aug 26;64(16):6450-6463. doi: 10.1021/acs.jcim.4c01193. Epub 2024 Jul 26.

Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery.协同化学结构和生物测定描述以增强药物发现中的分子性质预测。

J Chem Inf Model. 2024 Jun 24;64(12):4640-4650. doi: 10.1021/acs.jcim.4c00765. Epub 2024 Jun 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

分析到分子：基于大语言模型并利用生物分析背景的药物设计

Assay2Mol: large language model-based drug design using BioAssay context.

作者信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献