• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于药物发现中 ADMET 预测的混合片段 SMILES 标记化。

Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery.

机构信息

Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.

Digital Technologies Research Centre, National Research Council Canada, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada.

出版信息

BMC Bioinformatics. 2024 Aug 1;25(1):255. doi: 10.1186/s12859-024-05861-z.

DOI:10.1186/s12859-024-05861-z
PMID:39090573
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11295479/
Abstract

BACKGROUND

Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized.

RESULTS

This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs.

CONCLUSION

The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.

摘要

背景

药物发现和开发是一个极其昂贵和耗时的过程,需要识别新的分子,这些分子可以与生物标志物靶标相互作用,从而中断感兴趣的疾病途径。除了与靶标结合外,候选药物还需要满足影响吸收、分布、代谢、排泄和毒性(ADMET)的多种特性。人工智能方法提供了一个改善药物发现和开发过程每个步骤的机会,在这个过程中,我们首先面临的问题是如何有意义地表示分子,以便优化计算解决方案。

结果

本研究介绍了一种新颖的 SMILES-片段标记混合方法,结合了两种预训练策略,利用基于 Transformer 的模型。我们研究了混合标记在改善 ADMET 预测任务性能方面的效果。我们的方法利用了 MTL-BERT,这是一种仅编码器的 Transformer 模型,在 ADMET 预测方面达到了最新水平,并在一系列片段库截止值上对比了标准 SMILES 标记化和我们的混合方法。

结论

研究结果表明,虽然过多的片段会影响性能,但使用具有高频片段的混合标记可以在基础 SMILES 标记化的基础上进一步提高结果。这一进展突显了在 ADMET 性质预测中,将片段和字符级分子特征集成到 Transformer 模型训练中的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/7daed1bc5be7/12859_2024_5861_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/4525a92edc53/12859_2024_5861_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/d3c78988a9b6/12859_2024_5861_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/e9aab1752bd0/12859_2024_5861_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/15b675e030eb/12859_2024_5861_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/53e77362ec7e/12859_2024_5861_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/7daed1bc5be7/12859_2024_5861_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/4525a92edc53/12859_2024_5861_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/d3c78988a9b6/12859_2024_5861_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/e9aab1752bd0/12859_2024_5861_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/15b675e030eb/12859_2024_5861_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/53e77362ec7e/12859_2024_5861_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7589/11295479/7daed1bc5be7/12859_2024_5861_Fig6_HTML.jpg

相似文献

1
Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery.用于药物发现中 ADMET 预测的混合片段 SMILES 标记化。
BMC Bioinformatics. 2024 Aug 1;25(1):255. doi: 10.1186/s12859-024-05861-z.
2
Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration.通过SMILES枚举增强的多任务学习BERT推动药物发现中分子性质预测的边界
Research (Wash D C). 2022 Dec 15;2022:0004. doi: 10.34133/research.0004. eCollection 2022.
3
Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization.通过SMILES中的原子分词提高化学语言模型结果的质量。
J Cheminform. 2023 May 29;15(1):55. doi: 10.1186/s13321-023-00725-9.
4
Transformer-based deep learning method for optimizing ADMET properties of lead compounds.基于Transformer的深度学习方法用于优化先导化合物的ADMET性质。
Phys Chem Chem Phys. 2023 Jan 18;25(3):2377-2385. doi: 10.1039/d2cp05332b.
5
Absorption Distribution Metabolism Excretion and Toxicity Property Prediction Utilizing a Pre-Trained Natural Language Processing Model and Its Applications in Early-Stage Drug Development.利用预训练自然语言处理模型预测吸收、分布、代谢、排泄及毒性特性及其在早期药物研发中的应用
Pharmaceuticals (Basel). 2024 Mar 17;17(3):382. doi: 10.3390/ph17030382.
6
Quantum-Informed Molecular Representation Learning Enhancing ADMET Property Prediction.量子启发的分子表示学习增强 ADMET 性质预测。
J Chem Inf Model. 2024 Jul 8;64(13):5028-5040. doi: 10.1021/acs.jcim.4c00772. Epub 2024 Jun 25.
7
admetSAR3.0: a comprehensive platform for exploration, prediction and optimization of chemical ADMET properties.admetSAR3.0:一个全面的用于探索、预测和优化化学 ADMET 性质的平台。
Nucleic Acids Res. 2024 Jul 5;52(W1):W432-W438. doi: 10.1093/nar/gkae298.
8
A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence.一种基于BERT的预训练模型,用于从SMILES序列中提取分子结构信息。
J Cheminform. 2024 Jun 19;16(1):71. doi: 10.1186/s13321-024-00848-7.
9
Modeling ADMET.模拟药物的吸收、分布、代谢、排泄及毒性(ADMET)。
Methods Mol Biol. 2016;1425:63-83. doi: 10.1007/978-1-4939-3609-0_4.
10
Denoising Drug Discovery Data for Improved Absorption, Distribution, Metabolism, Excretion, and Toxicity Property Prediction.用于改善吸收、分布、代谢、排泄和毒性性质预测的药物发现数据去噪。
J Chem Inf Model. 2024 Aug 26;64(16):6324-6337. doi: 10.1021/acs.jcim.4c00639. Epub 2024 Aug 7.

引用本文的文献

1
Advancing ADMET prediction for major CYP450 isoforms: graph-based models, limitations, and future directions.推进主要细胞色素P450同工酶的ADMET预测:基于图的模型、局限性及未来方向。
Biomed Eng Online. 2025 Jul 23;24(1):93. doi: 10.1186/s12938-025-01412-6.
2
Hybridization of SMILES and chemical-environment-aware tokens to improve performance of molecular structure generation.将SMILES与化学环境感知令牌进行杂交以提高分子结构生成的性能。
Sci Rep. 2025 May 15;15(1):16892. doi: 10.1038/s41598-025-01890-7.
3
A new strategy for Cas protein recognition based on graph neural networks and SMILES encoding.

本文引用的文献

1
Chemprop: A Machine Learning Package for Chemical Property Prediction.Chemprop:一个用于化学性质预测的机器学习工具包。
J Chem Inf Model. 2024 Jan 8;64(1):9-17. doi: 10.1021/acs.jcim.3c01250. Epub 2023 Dec 26.
2
Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives.人工智能在药物毒性预测中的应用:最新进展、挑战与未来展望。
J Chem Inf Model. 2023 May 8;63(9):2628-2643. doi: 10.1021/acs.jcim.3c00200. Epub 2023 Apr 26.
3
Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction.
一种基于图神经网络和SMILES编码的Cas蛋白识别新策略。
Sci Rep. 2025 Apr 30;15(1):15236. doi: 10.1038/s41598-025-99999-2.
用于分子性质预测的药效团约束异构图变换器模型
Commun Chem. 2023 Apr 3;6(1):60. doi: 10.1038/s42004-023-00857-x.
4
Double-head transformer neural network for molecular property prediction.用于分子性质预测的双头变压器神经网络。
J Cheminform. 2023 Feb 23;15(1):27. doi: 10.1186/s13321-023-00700-4.
5
Opportunities and challenges in application of artificial intelligence in pharmacology.人工智能在药理学应用中的机遇与挑战。
Pharmacol Rep. 2023 Feb;75(1):3-18. doi: 10.1007/s43440-022-00445-1. Epub 2023 Jan 9.
6
Transformer-based deep learning method for optimizing ADMET properties of lead compounds.基于Transformer的深度学习方法用于优化先导化合物的ADMET性质。
Phys Chem Chem Phys. 2023 Jan 18;25(3):2377-2385. doi: 10.1039/d2cp05332b.
7
MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction.MolRoPE-BERT:一种用于分子性质预测的带有旋转位置嵌入的增强型分子表示法。
J Mol Graph Model. 2023 Jan;118:108344. doi: 10.1016/j.jmgm.2022.108344. Epub 2022 Sep 29.
8
Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism.基于图注意力机制的有机化合物合成可及性预测
J Chem Inf Model. 2022 Jun 27;62(12):2973-2986. doi: 10.1021/acs.jcim.2c00038. Epub 2022 Jun 8.
9
HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer.HelixADMET:一个强大且可扩展终点的 ADMET 系统,包含自我监督的知识迁移。
Bioinformatics. 2022 Jun 27;38(13):3444-3453. doi: 10.1093/bioinformatics/btac342.
10
Interpretable-ADMET: a web service for ADMET prediction and optimization based on deep neural representation.可解释的 ADMET:基于深度神经表示的 ADMET 预测和优化的网络服务。
Bioinformatics. 2022 May 13;38(10):2863-2871. doi: 10.1093/bioinformatics/btac192.