Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China; School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 510006, Guangdong Province, China.
Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China.
Comput Biol Med. 2024 Mar;171:108073. doi: 10.1016/j.compbiomed.2024.108073. Epub 2024 Jan 30.
Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.
大型语言模型在自然语言处理方面取得了重大进展,通过处理分子的文本表示形式,为分子科学的创新应用提供了支持。然而,大多数现有的语言模型无法捕捉具有复杂分子结构或图像的丰富信息。在本文中,我们引入了 GIT-Mol,这是一个多模态的大型语言模型,集成了图、图像和文本信息。为了方便多模态分子数据的集成,我们提出了 GIT-Former,这是一种新颖的架构,能够将所有模态对齐到一个统一的潜在空间中。与基线相比,我们在性质预测方面提高了 5%-10%的准确性,在分子生成有效性方面提高了 20.2%。通过任意到语言的分子翻译策略,我们的模型有可能执行更多的下游任务,例如化合物名称识别和化学反应预测。