FusionCLM：通过化学语言模型的知识融合增强分子性质预测

FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models.

作者信息

Lu Yutong, Li Yan Yi, Sun Yan, Hu Pingzhao

机构信息

Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

Department of Biochemistry, Western University, London, ON, Canada.

出版信息

J Cheminform. 2025 Aug 29;17(1):133. doi: 10.1186/s13321-025-01073-6.

DOI:10.1186/s13321-025-01073-6

PMID:40883821

Abstract

Chemical Language Models (CLMs) have demonstrated capabilities in extracting patterns and predicting from vast volume of the Simplified Molecular Input Line Entry System (SMILES), a notation used to represent molecular structures. Different CLMs, developed from various architectures, can provide unique insights into molecular properties. To harness the uniqueness of different CLMs, we propose FusionCLM, a novel stacking-ensemble learning algorithm that integrate the outputs of multiple CLMs into a unified framework. FusionCLM first generates SMILES embeddings, predictions, and losses from each CLM. Auxiliary models are trained on these first-level predictions and embeddings to estimate test losses during inference. The losses and predictions are then concatenated to create an integrated feature matrix, which trains second-level meta-models for final predictions. Empirical testing on five datasets demonstrates that FusionCLM have better performance than individual CLM at the first level and three advanced multimodal deep learning frameworks, showcasing FusionCLM's potential in advancing molecular property prediction.

摘要

化学语言模型（CLMs）已展现出从大量简化分子输入线性条目系统（SMILES，一种用于表示分子结构的符号）中提取模式并进行预测的能力。从各种架构开发而来的不同CLMs能够为分子性质提供独特的见解。为了利用不同CLMs的独特性，我们提出了FusionCLM，这是一种新颖的堆叠集成学习算法，它将多个CLMs的输出整合到一个统一框架中。FusionCLM首先从每个CLM生成SMILES嵌入、预测结果和损失。辅助模型在这些一级预测结果和嵌入上进行训练，以在推理过程中估计测试损失。然后将损失和预测结果连接起来创建一个集成特征矩阵，该矩阵用于训练二级元模型以进行最终预测。在五个数据集上的实证测试表明，FusionCLM在一级水平上比单个CLM以及三个先进的多模态深度学习框架具有更好的性能，展示了FusionCLM在推进分子性质预测方面的潜力。

相似文献

FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models.

J Cheminform. 2025 Aug 29;17(1):133. doi: 10.1186/s13321-025-01073-6.

Prescription of Controlled Substances: Benefits and Risks

Short-Term Memory Impairment

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Fusing Domain Knowledge with a Fine-Tuned Large Language Model for Enhanced Molecular Property Prediction.

J Chem Theory Comput. 2025 Jul 22;21(14):6743-6758. doi: 10.1021/acs.jctc.5c00605. Epub 2025 Jul 9.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.

Interdiscip Sci. 2025 Mar 11. doi: 10.1007/s12539-025-00696-5.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Deep Learning-Based Multimodal Fusion Approach for Predicting Acute Dermal Toxicity.

J Chem Inf Model. 2025 Jul 28;65(14):7540-7553. doi: 10.1021/acs.jcim.5c01128. Epub 2025 Jul 18.

Approaches for predicting dairy cattle methane emissions: from traditional methods to machine learning.

J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae219.

本文引用的文献

Diagnostic Prediction of portal vein thrombosis in chronic cirrhosis patients using data-driven precision medicine model.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad478.

TransFoxMol: predicting molecular property with focused attention.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad306.

HiGNN: A Hierarchical Informative Graph Neural Network for Molecular Property Prediction Equipped with Feature-Wise Attention.

J Chem Inf Model. 2023 Jan 9;63(1):43-55. doi: 10.1021/acs.jcim.2c01099. Epub 2022 Dec 14.

vsRNAfinder: a novel method for identifying high-confidence viral small RNAs from small RNA-Seq data.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac496.

Application of SMILES-based molecular generative model in new drug design.

Front Pharmacol. 2022 Oct 13;13:1046524. doi: 10.3389/fphar.2022.1046524. eCollection 2022.

FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac408.

Generative Models for De Novo Drug Design.

J Med Chem. 2021 Oct 14;64(19):14011-14027. doi: 10.1021/acs.jmedchem.1c00927. Epub 2021 Sep 17.

SMILES-based deep generative scaffold decorator for de-novo drug design.

J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.

Molecular property prediction: recent trends in the era of artificial intelligence.

Drug Discov Today Technol. 2019 Dec;32-33:29-36. doi: 10.1016/j.ddtec.2020.05.001. Epub 2020 Jul 1.

RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information.

BMC Bioinformatics. 2020 Feb 18;21(1):60. doi: 10.1186/s12859-020-3406-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

FusionCLM：通过化学语言模型的知识融合增强分子性质预测

FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献