微调蛋白质语言模型以理解错义变体的功能影响。

Fine-tuning protein language models to understand the functional impact of missense variants.

作者信息

Saadat Ali, Fellay Jacques

机构信息

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.

Swiss Institute of Bioinformatics, Lausanne, Switzerland.

出版信息

Comput Struct Biotechnol J. 2025 May 28;27:2199-2207. doi: 10.1016/j.csbj.2025.05.022. eCollection 2025.

DOI:10.1016/j.csbj.2025.05.022

PMID:40520595

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12166733/

Abstract

Elucidating the functional effects of missense variants is crucial yet challenging. To investigate their impact, we fine-tuned protein language models, including ESM2 and ProtT5, to classify 20 protein features at amino acid resolution. In addition, we trained a fully connected neural network classifier on frozen embeddings and compared its performance to fine-tuning in order to quantify the added value of task-specific adaptation. We then used the fine-tuned models to: 1) identify protein features enriched in either pathogenic or benign missense variants, and 2) compare the predicted feature profiles of proteins with reference and alternate alleles to understand how missense variants affect protein functionality. We show that our models can be used to reclassify variants of uncertain significance and provide mechanistic insights into the functional consequences of missense mutations.

摘要

阐明错义变体的功能影响至关重要但具有挑战性。为了研究它们的影响，我们对包括ESM2和ProtT5在内的蛋白质语言模型进行了微调，以在氨基酸分辨率下对20种蛋白质特征进行分类。此外，我们在冻结的嵌入上训练了一个全连接神经网络分类器，并将其性能与微调进行比较，以量化特定任务适应的附加值。然后，我们使用微调后的模型来：1）识别在致病性或良性错义变体中富集的蛋白质特征，以及2）比较具有参考等位基因和替代等位基因的蛋白质的预测特征谱，以了解错义变体如何影响蛋白质功能。我们表明，我们的模型可用于重新分类意义不确定的变体，并对错义突变的功能后果提供机制性见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e405/12166733/4ba41b147d80/gr001.jpg

相似文献

Fine-tuning protein language models to understand the functional impact of missense variants.微调蛋白质语言模型以理解错义变体的功能影响。

Comput Struct Biotechnol J. 2025 May 28;27:2199-2207. doi: 10.1016/j.csbj.2025.05.022. eCollection 2025.

Fine-tuning protein language models boosts predictions across diverse tasks.微调蛋白质语言模型可提高跨多种任务的预测能力。

Nat Commun. 2024 Aug 28;15(1):7407. doi: 10.1038/s41467-024-51844-2.

An analysis of protein language model embeddings for fold prediction.蛋白质语言模型嵌入物折叠预测分析。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac142.

In-silico Analysis of Missense Variants in ClinVar: Translating Variant Predictions into Variant Interpretation and Classification.ClinVar 中的错义变异体的计算机分析：将变异预测转化为变异解释和分类。

Int J Mol Sci. 2020 Jan 22;21(3):721. doi: 10.3390/ijms21030721.

Impact of the Mutational Landscape of the Sodium/Iodide Symporter in Congenital Hypothyroidism.钠/碘转运体突变景观对先天性甲状腺功能减退症的影响。

Thyroid. 2021 Dec;31(12):1776-1785. doi: 10.1089/thy.2021.0381.

Saturation genome editing-based functional evaluation and clinical classification of BRCA2 single nucleotide variants.基于饱和基因组编辑的BRCA2单核苷酸变异的功能评估与临床分类

bioRxiv. 2023 Dec 15:2023.12.14.571597. doi: 10.1101/2023.12.14.571597.

Variability in gene-based knowledge impacts variant classification: an analysis of FBN1 missense variants in ClinVar.基于基因的知识的变异性会影响变异分类：ClinVar 中 FBN1 错义变异的分析。

Eur J Hum Genet. 2019 Oct;27(10):1550-1560. doi: 10.1038/s41431-019-0440-3. Epub 2019 Jun 21.

DARVIC: Dihedral angle-reliant variant impact classifier for functional prediction of missense VUS.DARVIC：用于错义 VUS 功能预测的依赖二面角变异影响分类器。

Comput Methods Programs Biomed. 2023 Aug;238:107596. doi: 10.1016/j.cmpb.2023.107596. Epub 2023 May 11.

Pathogenic missense protein variants affect different functional pathways and proteomic features than healthy population variants.致病性错义蛋白变异体与健康人群变异体相比，会影响不同的功能途径和蛋白质组学特征。

PLoS Biol. 2021 Apr 28;19(4):e3001207. doi: 10.1371/journal.pbio.3001207. eCollection 2021 Apr.

Enhancing sentiment and intent analysis in public health via fine-tuned Large Language Models on tobacco and e-cigarette-related tweets.通过在与烟草和电子烟相关的推文上微调大型语言模型来增强公共卫生领域的情感和意图分析。

Front Big Data. 2024 Nov 28;7:1501154. doi: 10.3389/fdata.2024.1501154. eCollection 2024.

引用本文的文献

Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation.蛋白质语言模型识别出与相分离相关的无序保守基序。

bioRxiv. 2025 Jul 23:2024.12.12.628175. doi: 10.1101/2024.12.12.628175.

本文引用的文献

Bilingual language model for protein sequence and structure.用于蛋白质序列和结构的双语语言模型。

NAR Genom Bioinform. 2024 Nov 15;6(4):lqae150. doi: 10.1093/nargab/lqae150. eCollection 2024 Dec.

Designing interpretable deep learning applications for functional genomics: a quantitative analysis.设计可解释的深度学习应用于功能基因组学：一项定量分析。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae449.

Fine-tuning protein language models boosts predictions across diverse tasks.微调蛋白质语言模型可提高跨多种任务的预测能力。

Nat Commun. 2024 Aug 28;15(1):7407. doi: 10.1038/s41467-024-51844-2.

Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments.将可解释机器学习应用于计算生物学——新发展的陷阱、建议和机会。

Nat Methods. 2024 Aug;21(8):1454-1461. doi: 10.1038/s41592-024-02359-7. Epub 2024 Aug 9.

Democratizing protein language models with parameter-efficient fine-tuning.参数高效微调：用民主化方法对蛋白质语言模型进行优化。

Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2405840121. doi: 10.1073/pnas.2405840121. Epub 2024 Jun 20.

Whole genome sequencing in clinical practice.临床实践中的全基因组测序。

BMC Med Genomics. 2024 Jan 29;17(1):39. doi: 10.1186/s12920-024-01795-w.

A genomic mutational constraint map using variation in 76,156 human genomes.基于 76156 个人类基因组的变异，绘制出基因组突变约束图谱。

Nature. 2024 Jan;625(7993):92-100. doi: 10.1038/s41586-023-06045-0. Epub 2023 Dec 6.

Accurate proteome-wide missense variant effect prediction with AlphaMissense.使用 AlphaMissense 进行精确的全蛋白质错义变异效应预测。

Science. 2023 Sep 22;381(6664):eadg7492. doi: 10.1126/science.adg7492.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity.分析人类基因组中的错义变异揭示了广泛的基因特异性聚类，并提高了致病性预测的准确性。

Am J Hum Genet. 2022 Mar 3;109(3):457-470. doi: 10.1016/j.ajhg.2022.01.006. Epub 2022 Feb 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

微调蛋白质语言模型以理解错义变体的功能影响。

Fine-tuning protein language models to understand the functional impact of missense variants.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献