Saadat Ali, Fellay Jacques
School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
Swiss Institute of Bioinformatics, Lausanne, Switzerland.
Comput Struct Biotechnol J. 2025 May 28;27:2199-2207. doi: 10.1016/j.csbj.2025.05.022. eCollection 2025.
Elucidating the functional effects of missense variants is crucial yet challenging. To investigate their impact, we fine-tuned protein language models, including ESM2 and ProtT5, to classify 20 protein features at amino acid resolution. In addition, we trained a fully connected neural network classifier on frozen embeddings and compared its performance to fine-tuning in order to quantify the added value of task-specific adaptation. We then used the fine-tuned models to: 1) identify protein features enriched in either pathogenic or benign missense variants, and 2) compare the predicted feature profiles of proteins with reference and alternate alleles to understand how missense variants affect protein functionality. We show that our models can be used to reclassify variants of uncertain significance and provide mechanistic insights into the functional consequences of missense mutations.
阐明错义变体的功能影响至关重要但具有挑战性。为了研究它们的影响,我们对包括ESM2和ProtT5在内的蛋白质语言模型进行了微调,以在氨基酸分辨率下对20种蛋白质特征进行分类。此外,我们在冻结的嵌入上训练了一个全连接神经网络分类器,并将其性能与微调进行比较,以量化特定任务适应的附加值。然后,我们使用微调后的模型来:1)识别在致病性或良性错义变体中富集的蛋白质特征,以及2)比较具有参考等位基因和替代等位基因的蛋白质的预测特征谱,以了解错义变体如何影响蛋白质功能。我们表明,我们的模型可用于重新分类意义不确定的变体,并对错义突变的功能后果提供机制性见解。