• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度生成模型捕获遗传变异的突变效应。

Deep generative models of genetic variation capture the effects of mutations.

机构信息

Department of Systems Biology, Harvard Medical School, Boston, MA, USA.

Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

出版信息

Nat Methods. 2018 Oct;15(10):816-822. doi: 10.1038/s41592-018-0138-4. Epub 2018 Sep 24.

DOI:10.1038/s41592-018-0138-4
PMID:30250057
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6693876/
Abstract

The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently. Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach. Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies. We found that DeepSequence ( https://github.com/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.

摘要

蛋白质和 RNA 的功能是由许多残基的集体相互作用定义的,但大多数生物序列的统计模型几乎都是独立考虑位点的。最近的方法已经证明了包含相互作用以捕获成对协变的好处,但仍然无法达到更高阶的依赖关系。在这里,我们展示了如何通过具有非线性依赖性的潜在变量模型来捕获生物序列中的高阶、上下文相关约束。我们发现,DeepSequence(https://github.com/debbiemarkslab/DeepSequence),一种用于序列家族的概率模型,在各种深度突变扫描实验中预测突变的效果远远优于基于相同进化数据的现有方法。该模型是在仅基于序列信息的无监督方式下学习的,它基于生物学上有意义的先验知识,揭示了序列家族的潜在组织,并且可以用于探索序列空间的新部分。

相似文献

1
Deep generative models of genetic variation capture the effects of mutations.深度生成模型捕获遗传变异的突变效应。
Nat Methods. 2018 Oct;15(10):816-822. doi: 10.1038/s41592-018-0138-4. Epub 2018 Sep 24.
2
Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。
BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.
3
popDMS infers mutation effects from deep mutational scanning data.popDMS 从深度突变扫描数据中推断突变效应。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae499.
4
Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server.利用专用服务器在饱和突变文库和蛋白质序列比对中检测生物物理约束对蛋白质变异的影响并进行序列/结构映射。
BMC Bioinformatics. 2016 Jun 17;17(1):242. doi: 10.1186/s12859-016-1124-4.
5
Practical impacts of genomic data "cleaning" on biological discovery using surrogate variable analysis.基因组数据“清理”对使用替代变量分析的生物学发现的实际影响。
BMC Bioinformatics. 2015 Nov 6;16:372. doi: 10.1186/s12859-015-0808-5.
6
DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data.DiNAMO:高通量测序数据中高度敏感的 DNA 基序发现。
BMC Bioinformatics. 2018 Jun 11;19(1):223. doi: 10.1186/s12859-018-2215-1.
7
Mutation effects predicted from sequence co-variation.根据序列共变预测的突变效应。
Nat Biotechnol. 2017 Feb;35(2):128-135. doi: 10.1038/nbt.3769. Epub 2017 Jan 16.
8
A deep boosting based approach for capturing the sequence binding preferences of RNA-binding proteins from high-throughput CLIP-seq data.一种基于深度增强学习的方法,用于从高通量CLIP-seq数据中捕获RNA结合蛋白的序列结合偏好。
Nucleic Acids Res. 2017 Aug 21;45(14):e129. doi: 10.1093/nar/gkx492.
9
AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape.AMaLa:通过退火突变逼近景观分析定向进化实验。
Int J Mol Sci. 2021 Oct 9;22(20):10908. doi: 10.3390/ijms222010908.
10
Predicting mutant outcome by combining deep mutational scanning and machine learning.通过结合深度突变扫描和机器学习预测突变结果。
Proteins. 2022 Jan;90(1):45-57. doi: 10.1002/prot.26184. Epub 2021 Jul 31.

引用本文的文献

1
GenoDense-Net: unraveling the genomic puzzle of the global pathogen.基因密集网络:解开全球病原体的基因组谜团。
Trop Dis Travel Med Vaccines. 2025 Sep 2;11(1):32. doi: 10.1186/s40794-025-00267-y.
2
Generative Landscapes and Dynamics to Design Functional Multidomain Artificial Transmembrane Transporters.用于设计功能性多结构域人工跨膜转运蛋白的生成景观与动力学
ACS Cent Sci. 2025 Jul 10;11(8):1452-1466. doi: 10.1021/acscentsci.5c00708. eCollection 2025 Aug 27.
3
Creating an atlas of variant effects to resolve variants of uncertain significance and guide cardiovascular medicine.

本文引用的文献

1
Learning protein constitutive motifs from sequence data.从序列数据中学习蛋白质组成基序。
Elife. 2019 Mar 12;8:e39397. doi: 10.7554/eLife.39397.
2
Accurate classification of BRCA1 variants with saturation genome editing.饱和基因组编辑精准分类 BRCA1 变异。
Nature. 2018 Oct;562(7726):217-222. doi: 10.1038/s41586-018-0461-z. Epub 2018 Sep 12.
3
Multiplex assessment of protein variant abundance by massively parallel sequencing.通过大规模平行测序进行蛋白质变异体丰度的多重评估。
创建一个变异效应图谱,以解析意义未明的变异并指导心血管医学。
Nat Rev Cardiol. 2025 Sep 1. doi: 10.1038/s41569-025-01201-7.
4
Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression.通过深度学习和回归从小数据预测严重急性呼吸综合征冠状病毒2(SARS-CoV-2)刺突蛋白的进化
Front Syst Biol. 2024 Apr 9;4:1284668. doi: 10.3389/fsysb.2024.1284668. eCollection 2024.
5
BEST: Basic Embedding Search Tool Enhancing Discovery of Novel Enzyme.BEST:增强新型酶发现的基础嵌入搜索工具
Interdiscip Sci. 2025 Aug 11. doi: 10.1007/s12539-025-00753-z.
6
Understanding Language Model Scaling on Protein Fitness Prediction.理解语言模型扩展在蛋白质适应性预测中的应用
bioRxiv. 2025 Jul 23:2025.04.25.650688. doi: 10.1101/2025.04.25.650688.
7
Considering Metabolic Context in Enzyme Evolution and Design.酶进化与设计中的代谢背景考量
Biochemistry. 2025 Aug 19;64(16):3495-3507. doi: 10.1021/acs.biochem.5c00165. Epub 2025 Aug 5.
8
PLMFit: benchmarking transfer learning with protein language models for protein engineering.PLMFit:使用蛋白质语言模型进行蛋白质工程的迁移学习基准测试
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf381.
9
Deep-learning structure elucidation from single-mutant deep mutational scanning.基于单突变深度突变扫描的深度学习结构解析
Nat Commun. 2025 Jul 25;16(1):6874. doi: 10.1038/s41467-025-62261-4.
10
GOLF: A Generative AI Framework for Pathogenicity Prediction of Myocilin OLF Variants.高尔夫:一种用于肌纤蛋白OLF变体致病性预测的生成式人工智能框架。
bioRxiv. 2025 Jun 24:2025.06.17.660210. doi: 10.1101/2025.06.17.660210.
Nat Genet. 2018 Jun;50(6):874-882. doi: 10.1038/s41588-018-0122-z. Epub 2018 May 21.
4
Mapping mutational effects along the evolutionary landscape of HIV envelope.绘制 HIV 包膜在进化景观中的突变效应图谱。
Elife. 2018 Mar 28;7:e34420. doi: 10.7554/eLife.34420.
5
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules.使用数据驱动的分子连续表示法进行自动化学设计。
ACS Cent Sci. 2018 Feb 28;4(2):268-276. doi: 10.1021/acscentsci.7b00572. Epub 2018 Jan 12.
6
Three-body interactions improve contact prediction within direct-coupling analysis.三体相互作用提高了直接耦合分析中的接触预测。
Phys Rev E. 2017 Nov;96(5-1):052405. doi: 10.1103/PhysRevE.96.052405. Epub 2017 Nov 9.
7
A framework for exhaustively mapping functional missense variants.一个详尽映射功能错义变异的框架。
Mol Syst Biol. 2017 Dec 21;13(12):957. doi: 10.15252/msb.20177908.
8
Negative Epistasis in Experimental RNA Fitness Landscapes.实验性RNA适应度景观中的负上位性
J Mol Evol. 2017 Dec;85(5-6):159-168. doi: 10.1007/s00239-017-9817-5. Epub 2017 Nov 10.
9
Variant Interpretation: Functional Assays to the Rescue.变异解读:功能测定来帮忙。
Am J Hum Genet. 2017 Sep 7;101(3):315-325. doi: 10.1016/j.ajhg.2017.07.014.
10
Deconstruction of the Ras switching cycle through saturation mutagenesis.通过饱和诱变对Ras开关循环进行解构。
Elife. 2017 Jul 7;6:e27810. doi: 10.7554/eLife.27810.