• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于多目标蛋白质序列设计的帕累托最优采样

Pareto-optimal sampling for multi-objective protein sequence design.

作者信息

Luo Jiaqi, Ding Kerr, Luo Yunan

机构信息

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30308, USA.

出版信息

iScience. 2025 Feb 27;28(3):112119. doi: 10.1016/j.isci.2025.112119. eCollection 2025 Mar 21.

DOI:10.1016/j.isci.2025.112119
PMID:40160427
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11952807/
Abstract

Supervised machine learning (ML) has significantly advanced sequence-based protein property prediction. However, its inverse application, designing protein sequences with desired properties, remains under-explored. The challenges in sequence design stem from the vast search space and the rugged protein fitness landscape. In this work, we present MosPro, an efficient ML algorithm for property-guided protein sequence design. We frame sequence design as a discrete sampling problem. Utilizing a pre-trained differentiable ML model that predicts properties of sequences, MosPro shapes a distribution that assigns high probability mass to regions for high-property sequences. To generate designs, MosPro efficiently samples sequences from this constructed distribution. We further develop a Pareto optimization algorithm to propose sequences that are simultaneously optimized for multiple properties. Evaluations on experimental fitness landscapes demonstrated that MosPro generates sequences that optimally trade off multiple desiderata. Our results suggested an unparalleled potential of generative ML for efficient and controllable design for functional proteins.

摘要

监督式机器学习(ML)在基于序列的蛋白质特性预测方面取得了显著进展。然而,其反向应用,即设计具有所需特性的蛋白质序列,仍有待深入探索。序列设计中的挑战源于巨大的搜索空间和崎岖的蛋白质适应度景观。在这项工作中,我们提出了MosPro,一种用于特性引导的蛋白质序列设计的高效机器学习算法。我们将序列设计框架化为一个离散采样问题。利用一个预训练的可微机器学习模型来预测序列的特性,MosPro塑造了一种分布,该分布将高概率质量分配给高特性序列的区域。为了生成设计,MosPro从这个构建的分布中高效地采样序列。我们进一步开发了一种帕累托优化算法,以提出针对多种特性同时进行优化的序列。对实验适应度景观的评估表明,MosPro生成的序列能够在多个需求之间进行最佳权衡。我们的结果表明,生成式机器学习在功能性蛋白质的高效和可控设计方面具有无与伦比的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/0121b3f8e129/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/f6e0e16ab580/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/93421fc8f2c0/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/5a1644d99bfd/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/58e4d1f4d49a/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/0121b3f8e129/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/f6e0e16ab580/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/93421fc8f2c0/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/5a1644d99bfd/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/58e4d1f4d49a/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3883/11952807/0121b3f8e129/gr4.jpg

相似文献

1
Pareto-optimal sampling for multi-objective protein sequence design.用于多目标蛋白质序列设计的帕累托最优采样
iScience. 2025 Feb 27;28(3):112119. doi: 10.1016/j.isci.2025.112119. eCollection 2025 Mar 21.
2
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
3
High-Accuracy Polymer Property Detection via Pareto-Optimized SMILES-Based Deep Learning.通过帕累托优化的基于SMILES的深度学习实现高精度聚合物性能检测。
Polymers (Basel). 2025 Jun 28;17(13):1801. doi: 10.3390/polym17131801.
4
Short-Term Memory Impairment短期记忆障碍
5
Blood biomarkers for the non-invasive diagnosis of endometriosis.用于子宫内膜异位症无创诊断的血液生物标志物。
Cochrane Database Syst Rev. 2016 May 1;2016(5):CD012179. doi: 10.1002/14651858.CD012179.
6
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.预防、检测和管理产后出血的认知和经验:定性证据综合。
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
7
Idiographic Lapse Prediction With State Space Modeling: Algorithm Development and Validation Study.基于状态空间模型的个性化失误预测:算法开发与验证研究
JMIR Form Res. 2025 Jun 3;9:e73265. doi: 10.2196/73265.
8
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
9
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染鉴定的影响。
Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.
10
Biological causes and impacts of rugged tree landscapes in phylodynamic inference.系统发育动力学推断中崎岖树木景观的生物学原因及影响
bioRxiv. 2025 Jun 12:2025.06.10.657742. doi: 10.1101/2025.06.10.657742.

本文引用的文献

1
Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering.机器学习引导的适应性和多样性协同优化促进了酶工程组合文库设计。
Nat Commun. 2024 Jul 29;15(1):6392. doi: 10.1038/s41467-024-50698-y.
2
An integrative approach to protein sequence design through multiobjective optimization.通过多目标优化进行蛋白质序列设计的综合方法。
PLoS Comput Biol. 2024 Jul 11;20(7):e1011953. doi: 10.1371/journal.pcbi.1011953. eCollection 2024 Jul.
3
Evolutionary-scale prediction of atomic-level protein structure with a language model.
用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
4
Large language models generate functional protein sequences across diverse families.大型语言模型可生成不同家族的功能性蛋白质序列。
Nat Biotechnol. 2023 Aug;41(8):1099-1106. doi: 10.1038/s41587-022-01618-2. Epub 2023 Jan 26.
5
Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space.使用能够推广到新突变空间的机器学习模型来优化治疗性抗体的亲和力和特异性。
Nat Commun. 2022 Jul 1;13(1):3788. doi: 10.1038/s41467-022-31457-3.
6
Learning protein fitness models from evolutionary and assay-labeled data.从进化和实验标记数据中学习蛋白质适应性模型。
Nat Biotechnol. 2022 Jul;40(7):1114-1122. doi: 10.1038/s41587-021-01146-5. Epub 2022 Jan 17.
7
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
8
Low-N protein engineering with data-efficient deep learning.低蛋白工程与数据高效深度学习。
Nat Methods. 2021 Apr;18(4):389-396. doi: 10.1038/s41592-021-01100-y. Epub 2021 Apr 7.
9
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
10
Affinity Maturation Enhances Antibody Specificity but Compromises Conformational Stability.亲和力成熟增强了抗体的特异性,但降低了构象稳定性。
Cell Rep. 2019 Sep 24;28(13):3300-3308.e4. doi: 10.1016/j.celrep.2019.08.056.