GIT-Mol：一种具有图、图像和文本的分子科学多模态大语言模型。

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

机构信息

Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China; School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 510006, Guangdong Province, China.

Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China.

出版信息

Comput Biol Med. 2024 Mar;171:108073. doi: 10.1016/j.compbiomed.2024.108073. Epub 2024 Jan 30.

DOI:10.1016/j.compbiomed.2024.108073

PMID:38359660

Abstract

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

摘要

大型语言模型在自然语言处理方面取得了重大进展，通过处理分子的文本表示形式，为分子科学的创新应用提供了支持。然而，大多数现有的语言模型无法捕捉具有复杂分子结构或图像的丰富信息。在本文中，我们引入了 GIT-Mol，这是一个多模态的大型语言模型，集成了图、图像和文本信息。为了方便多模态分子数据的集成，我们提出了 GIT-Former，这是一种新颖的架构，能够将所有模态对齐到一个统一的潜在空间中。与基线相比，我们在性质预测方面提高了 5%-10%的准确性，在分子生成有效性方面提高了 20.2%。通过任意到语言的分子翻译策略，我们的模型有可能执行更多的下游任务，例如化合物名称识别和化学反应预测。

相似文献

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

Comput Biol Med. 2024 Mar;171:108073. doi: 10.1016/j.compbiomed.2024.108073. Epub 2024 Jan 30.

What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations.

Front Artif Intell. 2021 Dec 3;4:767971. doi: 10.3389/frai.2021.767971. eCollection 2021.

MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.

IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.

Distilling the knowledge from large-language model for health event prediction.

Sci Rep. 2024 Dec 28;14(1):30675. doi: 10.1038/s41598-024-75331-2.

MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i357-i368. doi: 10.1093/bioinformatics/btae260.

Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training.

IEEE J Biomed Health Inform. 2022 Dec;26(12):6070-6080. doi: 10.1109/JBHI.2022.3207502. Epub 2022 Dec 7.

MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion.

Math Biosci Eng. 2023 Jun 25;20(8):14096-14116. doi: 10.3934/mbe.2023630.

Adaptive Latent Graph Representation Learning for Image-Text Matching.

IEEE Trans Image Process. 2023;32:471-482. doi: 10.1109/TIP.2022.3229631. Epub 2022 Dec 30.

Semi-supervised multi-modal medical image segmentation with unified translation.

Comput Biol Med. 2024 Jun;176:108570. doi: 10.1016/j.compbiomed.2024.108570. Epub 2024 May 8.

Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt.

Neural Netw. 2025 May;185:107173. doi: 10.1016/j.neunet.2025.107173. Epub 2025 Jan 18.

引用本文的文献

Mol-SGGI: an attention-guided comprehensive molecular multi-representation learning and adaptive fusion framework for molecular property prediction.

Mol Divers. 2025 Aug 10. doi: 10.1007/s11030-025-11294-4.

Multimodal deep learning for chemical toxicity prediction and management.

Sci Rep. 2025 Jun 3;15(1):19491. doi: 10.1038/s41598-025-95720-5.

Local reaction condition optimization via machine learning.

J Mol Model. 2025 Apr 23;31(5):143. doi: 10.1007/s00894-025-06365-0.

Optical multilayer thin film structure inverse design: From optimization to deep learning.

iScience. 2025 Mar 14;28(4):112222. doi: 10.1016/j.isci.2025.112222. eCollection 2025 Apr 18.

MulAFNet: Integrating Multiple Molecular Representations for Enhanced Property Prediction.

ACS Omega. 2025 Mar 19;10(12):12043-12053. doi: 10.1021/acsomega.4c09884. eCollection 2025 Apr 1.

A review of large language models and autonomous agents in chemistry.

Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.

Large language model to multimodal large language model: A journey to shape the biological macromolecules to biological sciences and medicine.

Mol Ther Nucleic Acids. 2024 Jun 15;35(3):102255. doi: 10.1016/j.omtn.2024.102255. eCollection 2024 Sep 10.

ChatMol: interactive molecular discovery with natural language.

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae534.

Application of Transformers in Cheminformatics.

J Chem Inf Model. 2024 Jun 10;64(11):4392-4409. doi: 10.1021/acs.jcim.3c02070. Epub 2024 May 30.

Mass spectra prediction with structural motif-based graph neural networks.

Sci Rep. 2024 Jan 16;14(1):1400. doi: 10.1038/s41598-024-51760-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GIT-Mol：一种具有图、图像和文本的分子科学多模态大语言模型。

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

机构信息

Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China; School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 510006, Guangdong Province, China.

Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China.

出版信息

Comput Biol Med. 2024 Mar;171:108073. doi: 10.1016/j.compbiomed.2024.108073. Epub 2024 Jan 30.

DOI:10.1016/j.compbiomed.2024.108073

PMID:38359660

Abstract

摘要

GIT-Mol：一种具有图、图像和文本的分子科学多模态大语言模型。

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

GIT-Mol：一种具有图、图像和文本的分子科学多模态大语言模型。

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

机构信息

出版信息

相似文献

引用本文的文献