Suppr超能文献

将大语言模型与几何深度模型相结合用于蛋白质表示。

Aligning large language models and geometric deep models for protein representation.

作者信息

Shu Dong, Duan Bingbing, Guo Kai, Zhou Kaixiong, Tang Jiliang, Du Mengnan

机构信息

Northwestern University, Computer Science Department, Evanston, IL 60201, USA.

University of Pittsburgh, Biological Sciences Department, Pittsburgh, PA 15260, USA.

出版信息

Patterns (N Y). 2025 Apr 11;6(5):101227. doi: 10.1016/j.patter.2025.101227. eCollection 2025 May 9.

Abstract

In this study, we explore the alignment of multimodal representations between large language models (LLMs) and geometric deep models (GDMs) in the protein domain. We comprehensively evaluate three LLMs with four protein-specialized GDMs. Our work examines alignment factors from both model and protein perspectives, identifying challenges in current alignment methodologies and proposing strategies to improve the alignment process. Experimental results reveal that GDMs incorporating both graph and 3D structural information align better with LLMs, larger LLMs demonstrate improved alignment capabilities, and protein rarity significantly impacts alignment performance. We also find that increasing GDM embedding dimensions, using two-layer projection heads, and fine-tuning LLMs on protein-specific data substantially enhance alignment quality. Last, we demonstrate that improved alignment correlates with better downstream performance and reduced hallucination in protein-focused multimodal LLMs.

摘要

在本研究中,我们探索了蛋白质领域中大型语言模型(LLMs)与几何深度模型(GDMs)之间多模态表示的对齐情况。我们用四个蛋白质专用的GDMs全面评估了三个LLMs。我们的工作从模型和蛋白质两个角度研究了对齐因素,确定了当前对齐方法中的挑战,并提出了改进对齐过程的策略。实验结果表明,结合了图和三维结构信息的GDMs与LLMs的对齐效果更好,更大的LLMs展示出了更强的对齐能力,并且蛋白质的稀有性显著影响对齐性能。我们还发现,增加GDM嵌入维度、使用双层投影头以及在蛋白质特定数据上对LLMs进行微调,可大幅提高对齐质量。最后,我们证明,改进的对齐与更好的下游性能以及蛋白质聚焦多模态LLMs中幻觉的减少相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/12142629/fc77e39f2146/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验