TL突变：使用迁移学习预测突变的影响。

TLmutation: Predicting the Effects of Mutations Using Transfer Learning.

作者信息

Shamsi Zahra, Chan Matthew, Shukla Diwakar

机构信息

Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.

Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.

出版信息

J Phys Chem B. 2020 May 14;124(19):3845-3854. doi: 10.1021/acs.jpcb.0c00197. Epub 2020 May 1.

DOI:10.1021/acs.jpcb.0c00197

PMID:32308006

Abstract

A reccurring challenge in bioinformatics is predicting the phenotypic consequence of amino acid variation in proteins. With the recent advancements in sequencing techniques, sufficient genomic data has become available to train models that predict the evolutionary statistical energies, but there is still inadequate experimental data to directly predict functional effects. One approach to overcome this data scarcity is to apply transfer learning and train more models with available data sets. In this study, we propose a set of transfer learning algorithms we call TLmutation, which implements a supervised transfer learning algorithm that transfers knowledge from survival data of a protein to a particular function of that protein. This is followed by an unsupervised transfer learning algorithm that extends the knowledge to a homologous protein. We explore the application of our algorithms in three cases. First, we test the supervised transfer on 17 previously published deep mutagenesis data sets to complete and refine missing data points. We further investigate these data sets to identify which mutations build better predictors of variant functions. In the second case, we apply the algorithm to predict higher-order mutations solely from single point mutagenesis data. Finally, we perform the unsupervised transfer learning algorithm to predict mutational effects of homologous proteins from experimental data sets. These algorithms are generalized to transfer knowledge between Markov random field models. We show the benefit of our transfer learning algorithms to utilize informative deep mutational data and provide new insights into protein variant functions. As these algorithms are generalized to transfer knowledge between Markov random field models, we expect these algorithms to be applicable to other disciplines.

摘要

生物信息学中一个反复出现的挑战是预测蛋白质中氨基酸变异的表型后果。随着测序技术的最新进展，已经有足够的基因组数据可用于训练预测进化统计能量的模型，但仍然缺乏足够的实验数据来直接预测功能效应。克服这种数据稀缺的一种方法是应用迁移学习并用可用数据集训练更多模型。在本研究中，我们提出了一组我们称为TLmutation的迁移学习算法，它实现了一种监督迁移学习算法，该算法将蛋白质生存数据中的知识转移到该蛋白质的特定功能上。随后是一种无监督迁移学习算法，将知识扩展到同源蛋白质。我们在三种情况下探索了我们算法的应用。首先，我们在17个先前发表的深度诱变数据集上测试监督迁移，以完成和完善缺失的数据点。我们进一步研究这些数据集，以确定哪些突变能更好地预测变体功能。在第二种情况下，我们应用该算法仅根据单点诱变数据预测高阶突变。最后，我们执行无监督迁移学习算法，从实验数据集中预测同源蛋白质的突变效应。这些算法被推广到在马尔可夫随机场模型之间转移知识。我们展示了我们的迁移学习算法在利用信息丰富的深度诱变数据方面的优势，并为蛋白质变体功能提供了新的见解。由于这些算法被推广到在马尔可夫随机场模型之间转移知识，我们期望这些算法适用于其他学科。

相似文献

TLmutation: Predicting the Effects of Mutations Using Transfer Learning.TL突变：使用迁移学习预测突变的影响。

J Phys Chem B. 2020 May 14;124(19):3845-3854. doi: 10.1021/acs.jpcb.0c00197. Epub 2020 May 1.

Predicting mutant outcome by combining deep mutational scanning and machine learning.通过结合深度突变扫描和机器学习预测突变结果。

Proteins. 2022 Jan;90(1):45-57. doi: 10.1002/prot.26184. Epub 2021 Jul 31.

Using machine learning to predict the effects and consequences of mutations in proteins.利用机器学习预测蛋白质突变的影响和后果。

Curr Opin Struct Biol. 2023 Feb;78:102518. doi: 10.1016/j.sbi.2022.102518. Epub 2023 Jan 3.

SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.SNooPer：一种基于机器学习从低深度下一代测序中识别体细胞变异的方法。

BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.

Supervised and unsupervised algorithms for bioinformatics and data science.生物信息学和数据科学的监督和无监督算法。

Prog Biophys Mol Biol. 2020 Mar;151:14-22. doi: 10.1016/j.pbiomolbio.2019.11.012. Epub 2019 Dec 6.

Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学，使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应

Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.

Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks.蛋白质中的迁移学习：评估生物信息学任务中新型蛋白质学习表示。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac232.

Semi-supervised learning of Hidden Markov Models for biological sequence analysis.生物序列分析的隐马尔可夫模型的半监督学习。

Bioinformatics. 2019 Jul 1;35(13):2208-2215. doi: 10.1093/bioinformatics/bty910.

A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction.一种通过普罗克汝斯分析和均值漂移进行癌症药物敏感性预测的迁移学习方法。

J Bioinform Comput Biol. 2018 Jun;16(3):1840014. doi: 10.1142/S0219720018400140.

A systematic exploration of [Formula: see text] cutoff ranges in machine learning models for protein mutation stability prediction.对用于蛋白质突变稳定性预测的机器学习模型中[公式：见正文]截止范围的系统探索。

J Bioinform Comput Biol. 2018 Oct;16(5):1840022. doi: 10.1142/S021972001840022X.

引用本文的文献

Transfer learning towards predicting viral missense mutations: A case study on SARS-CoV-2.面向预测病毒错义突变的迁移学习：以严重急性呼吸综合征冠状病毒2为例的研究

Comput Struct Biotechnol J. 2025 Apr 22;27:1686-1692. doi: 10.1016/j.csbj.2025.04.029. eCollection 2025.

Substrate prediction for RiPP biosynthetic enzymes masked language modeling and transfer learning.RiPP生物合成酶的底物预测：掩码语言建模与迁移学习

Digit Discov. 2024 Dec 2;4(2):343-354. doi: 10.1039/d4dd00170b. eCollection 2025 Feb 12.

Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors.变异影响预测器数据库（VIPdb），版本 2：三十年来遗传变异影响预测器的趋势。

Hum Genomics. 2024 Aug 28;18(1):90. doi: 10.1186/s40246-024-00663-z.

Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors.变异影响预测数据库（VIPdb），版本2：25年基因变异影响预测的趋势

bioRxiv. 2024 Jun 28:2024.06.25.600283. doi: 10.1101/2024.06.25.600283.

Oligomerization of Monoamine Transporters.单胺转运体的寡聚化。

Subcell Biochem. 2024;104:119-137. doi: 10.1007/978-3-031-58843-3_7.

Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information.通过对序列协变信息的有针对性选择，可以预测和解释蛋白质突变的功能影响。

Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2312335121. doi: 10.1073/pnas.2312335121. Epub 2024 Jun 18.

Leveraging machine learning models for peptide-protein interaction prediction.利用机器学习模型进行肽-蛋白质相互作用预测。

RSC Chem Biol. 2024 Mar 13;5(5):401-417. doi: 10.1039/d3cb00208j. eCollection 2024 May 8.

Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.通过掩码语言建模和迁移学习对核糖体合成和翻译后修饰肽生物合成酶的底物预测

ArXiv. 2024 Feb 23:arXiv:2402.15181v1.

Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction.利用机器学习模型进行肽-蛋白质相互作用预测。

ArXiv. 2024 Feb 7:arXiv:2310.18249v2.

Simplifying complex antibody engineering using machine learning.利用机器学习简化复杂的抗体工程。

Cell Syst. 2023 Aug 16;14(8):667-675. doi: 10.1016/j.cels.2023.04.009.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

TL突变：使用迁移学习预测突变的影响。

TLmutation: Predicting the Effects of Mutations Using Transfer Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献