Suppr超能文献

用于生物活性预测的深度化学语言处理指南。

A hitchhiker's guide to deep chemical language processing for bioactivity prediction.

作者信息

Özçelik Rıza, Grisoni Francesca

机构信息

Eindhoven University of Technology, Institute for Complex Molecular Systems, Eindhoven AI Systems Institute, Dept. Biomedical Engineering Eindhoven Netherlands

Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht Netherlands.

出版信息

Digit Discov. 2024 Dec 16;4(2):316-325. doi: 10.1039/d4dd00311j. eCollection 2025 Feb 12.

Abstract

Deep learning has significantly accelerated drug discovery, with 'chemical language' processing (CLP) emerging as a prominent approach. CLP approaches learn from molecular string representations (, Simplified Molecular Input Line Entry Systems [SMILES] and Self-Referencing Embedded Strings [SELFIES]) with methods akin to natural language processing. Despite their growing importance, training predictive CLP models is far from trivial, as it involves many 'bells and whistles'. Here, we analyze the key elements of CLP and provide guidelines for newcomers and experts. Our study spans three neural network architectures, two string representations, three embedding strategies, across ten bioactivity datasets, for both classification and regression purposes. This 'hitchhiker's guide' not only underscores the importance of certain methodological decisions, but it also equips researchers with practical recommendations on ideal choices, , in terms of neural network architectures, molecular representations, and hyperparameter optimization.

摘要

深度学习显著加速了药物发现,“化学语言”处理(CLP)作为一种突出的方法应运而生。CLP方法通过类似于自然语言处理的方法,从分子字符串表示(如简化分子输入线输入系统[SMILES]和自引用嵌入字符串[SELFIES])中学习。尽管它们的重要性日益增加,但训练预测性CLP模型绝非易事,因为它涉及许多“花里胡哨的东西”。在这里,我们分析了CLP的关键要素,并为新手和专家提供指导。我们的研究跨越三种神经网络架构、两种字符串表示、三种嵌入策略,涵盖十个生物活性数据集,用于分类和回归目的。这本“搭便车指南”不仅强调了某些方法决策的重要性,还为研究人员提供了关于理想选择的实用建议,比如在神经网络架构、分子表示和超参数优化方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bed0/11667676/d04ef4778857/d4dd00311j-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验