Li Na, Qiao Jianbo, Gao Fei, Wang Yanling, Shi Hua, Zhang Zilong, Cui Feifei, Zhang Lichao, Wei Leyi
School of Computer and Information Engineering, Qilu Institute of Technology, Jinan 250200, China.
School of Software, Shandong University, Jinan 250100, China.
J Chem Inf Model. 2025 Jun 9;65(11):5518-5527. doi: 10.1021/acs.jcim.5c00895. Epub 2025 May 27.
Deep learning models have demonstrated their potential in learning effective molecular representations critical for drug property prediction and drug discovery. Despite significant advancements in leveraging multimodal drug molecule semantics, existing approaches often struggle with challenges such as low-quality data and structural complexity. Large language models (LLMs) excel in generating high-quality molecular representations due to their robust characterization capabilities. In this work, we introduce GICL, a cross-modal contrastive learning framework that integrates LLM-derived embeddings with molecular image representations. Specifically, LLMs extract feature representations from the SMILES strings of drug molecules, which are then contrasted with graphical representations of molecular images to achieve a holistic understanding of molecular features. Experimental results demonstrate that GICL achieves state-of-the-art performance on the ADMET task while offering interpretable insights into drug properties, thereby facilitating more efficient drug design and discovery.
深度学习模型已在学习对药物性质预测和药物发现至关重要的有效分子表征方面展现出其潜力。尽管在利用多模态药物分子语义方面取得了重大进展,但现有方法往往难以应对诸如低质量数据和结构复杂性等挑战。大型语言模型(LLMs)因其强大的表征能力,在生成高质量分子表征方面表现出色。在这项工作中,我们引入了GICL,这是一个跨模态对比学习框架,它将基于大型语言模型的嵌入与分子图像表征相结合。具体而言,大型语言模型从药物分子的SMILES字符串中提取特征表征,然后将这些表征与分子图像的图形表征进行对比,以实现对分子特征的全面理解。实验结果表明,GICL在ADMET任务上取得了领先的性能,同时为药物性质提供了可解释的见解,从而促进了更高效的药物设计和发现。