利用……对复杂基因型-表型图谱进行推断和可视化

Inference and visualization of complex genotype-phenotype maps with .

作者信息

Martí-Gómez Carlos, Zhou Juannan, Chen Wei-Chia, Kinney Justin B, McCandlish David M

机构信息

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724.

Department of Biology, University of Florida, Gainesville, FL, 32611.

出版信息

bioRxiv. 2025 Mar 15:2025.03.09.642267. doi: 10.1101/2025.03.09.642267.

DOI:10.1101/2025.03.09.642267

PMID:40161830

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11952336/

Abstract

Multiplex assays of variant effect (MAVEs) allow the functional characterization of an unprecedented number of sequence variants in both gene regulatory regions and protein coding sequences. This has enabled the study of nearly complete combinatorial libraries of mutational variants and revealed the widespread influence of higher-order genetic interactions that arise when multiple mutations are combined. However, the lack of appropriate tools for exploratory analysis of this high-dimensional data limits our overall understanding of the main qualitative properties of complex genotype-phenotype maps. To fill this gap, we have developed (https://github.com/cmarti/gpmap-tools), a library that integrates Gaussian process models for inference, phenotypic imputation, and error estimation from incomplete and noisy MAVE data and collections of natural sequences, together with methods for summarizing patterns of higher-order epistasis and non-linear dimensionality reduction techniques that allow visualization of genotype-phenotype maps containing up to millions of genotypes. Here, we used to study the genotype-phenotype map of the Shine-Dalgarno sequence, a motif that modulates binding of the 16S rRNA to the 5' untranslated region (UTR) of mRNAs through base pair complementarity during translation initiation in prokaryotes. We inferred full combinatorial landscapes containing 262,144 different sequences from the sequences of 5,311 5'UTRs in the genome and from experimental MAVE data. Visualizations of the inferred landscapes were largely consistent with each other, and unveiled a simple molecular mechanism underlying the highly epistatic genotype-phenotype map of the Shine-Dalgarno sequence.

摘要

变异效应多重分析（MAVEs）能够对基因调控区域和蛋白质编码序列中数量空前的序列变异进行功能表征。这使得对几乎完整的突变变异组合文库进行研究成为可能，并揭示了多个突变组合时出现的高阶遗传相互作用的广泛影响。然而，缺乏用于对这种高维数据进行探索性分析的合适工具，限制了我们对复杂基因型-表型图谱主要定性特征的全面理解。为了填补这一空白，我们开发了（https://github.com/cmarti/gpmap-tools）一个库，该库集成了高斯过程模型，用于从不完整和有噪声的MAVE数据以及自然序列集合中进行推理、表型插补和误差估计，同时还包括用于总结高阶上位性模式的方法和非线性降维技术，这些技术能够可视化包含多达数百万个基因型的基因型-表型图谱。在这里，我们使用该库来研究Shine-Dalgarno序列的基因型-表型图谱，该基序在原核生物翻译起始过程中通过碱基对互补性调节16S rRNA与mRNA的5'非翻译区（UTR）的结合。我们从基因组中5311个5'UTR的序列和实验性MAVE数据中推断出包含262,144个不同序列的完整组合景观。推断出的景观可视化结果在很大程度上相互一致，并揭示了Shine-Dalgarno序列高度上位性基因型-表型图谱背后的简单分子机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e7b/11952336/02dd08fdf1df/nihpp-2025.03.09.642267v2-f0001.jpg

相似文献

Inference and visualization of complex genotype-phenotype maps with .利用……对复杂基因型-表型图谱进行推断和可视化

bioRxiv. 2025 Mar 15:2025.03.09.642267. doi: 10.1101/2025.03.09.642267.

MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect.MAVE-NN：从变异效应的多重分析中学习基因型-表型图谱。

Genome Biol. 2022 Apr 15;23(1):98. doi: 10.1186/s13059-022-02661-7.

Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods.使用高通量方法实验性绘制分子适应性景观的采样策略。

J Mol Evol. 2024 Aug;92(4):402-414. doi: 10.1007/s00239-024-10179-8. Epub 2024 Jun 17.

Analysis of base-pairing potentials between 16S rRNA and 5' UTR for translation initiation in various prokaryotes.不同原核生物中16S rRNA与5'UTR之间用于翻译起始的碱基配对潜力分析。

Bioinformatics. 1999 Jul-Aug;15(7-8):578-81. doi: 10.1093/bioinformatics/15.7.578.

Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter.大肠杆菌 lac 启动子的基因型到表型映射和适应景观。

PLoS One. 2013 May 1;8(5):e61570. doi: 10.1371/journal.pone.0061570. Print 2013.

Prokaryotic coding regions have little if any specific depletion of Shine-Dalgarno motifs.原核生物编码区几乎没有特有的 Shine-Dalgarno 基序缺失。

PLoS One. 2018 Aug 23;13(8):e0202768. doi: 10.1371/journal.pone.0202768. eCollection 2018.

Global fitness landscapes of the Shine-Dalgarno sequence.Shine-Dalgarno 序列的全球适应性景观。

Genome Res. 2020 May;30(5):711-723. doi: 10.1101/gr.260182.119. Epub 2020 May 18.

Haloferax volcanii, a prokaryotic species that does not use the Shine Dalgarno mechanism for translation initiation at 5'-UTRs.嗜盐嗜热栖热菌，一种不在5'-非翻译区使用夏因-达尔加诺机制进行翻译起始的原核生物。

PLoS One. 2014 Apr 14;9(4):e94979. doi: 10.1371/journal.pone.0094979. eCollection 2014.

Novel Translation Initiation Regulation Mechanism in Escherichia coli ptrB Mediated by a 5'-Terminal AUG.由5'-末端AUG介导的大肠杆菌ptrB中的新型翻译起始调控机制。

J Bacteriol. 2017 Jun 27;199(14). doi: 10.1128/JB.00091-17. Print 2017 Jul 15.

Local absence of secondary structure permits translation of mRNAs that lack ribosome-binding sites.局部二级结构缺失允许缺乏核糖体结合位点的 mRNAs 进行翻译。

PLoS Genet. 2011 Jun;7(6):e1002155. doi: 10.1371/journal.pgen.1002155. Epub 2011 Jun 23.

本文引用的文献

Evolutionary paths that link orthogonal pairs of binding proteins.连接结合蛋白正交对的进化路径。

Cell Syst. 2025 May 21;16(5):101262. doi: 10.1016/j.cels.2025.101262. Epub 2025 Apr 10.

The highly rugged yet navigable regulatory landscape of the bacterial transcription factor TetR.细菌转录因子TetR高度复杂却可调控的调控环境。

Nat Commun. 2024 Dec 30;15(1):10745. doi: 10.1038/s41467-024-54723-y.

MoCHI: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis, and allostery from deep mutational scanning data.MoCHI：用于拟合可解释模型并从深度突变扫描数据中量化能量、能量耦合、上位性和变构的神经网络。

Genome Biol. 2024 Dec 2;25(1):303. doi: 10.1186/s13059-024-03444-y.

Addressing epistasis in the design of protein function.解决蛋白质功能设计中的上位效应。

Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2314999121. doi: 10.1073/pnas.2314999121. Epub 2024 Aug 12.

A combinatorially complete epistatic fitness landscape in an enzyme active site.酶活性位点中的组合完全上位适合度景观。

Proc Natl Acad Sci U S A. 2024 Aug 6;121(32):e2400439121. doi: 10.1073/pnas.2400439121. Epub 2024 Jul 29.

Epistasis and pleiotropy-induced variation for plant breeding.上位性和多效性引起的植物育种变异。

Plant Biotechnol J. 2024 Oct;22(10):2788-2807. doi: 10.1111/pbi.14405. Epub 2024 Jun 14.

Transfer learning for cross-context prediction of protein expression from 5'UTR sequence.从 5'UTR 序列跨情境预测蛋白质表达的迁移学习

Nucleic Acids Res. 2024 Jul 22;52(13):e58. doi: 10.1093/nar/gkae491.

Robust genetic codes enhance protein evolvability.稳健的遗传密码增强了蛋白质的可进化性。

PLoS Biol. 2024 May 16;22(5):e3002594. doi: 10.1371/journal.pbio.3002594. eCollection 2024 May.

A rugged yet easily navigable fitness landscape.崎岖但易于导航的健身地形。

Science. 2023 Nov 24;382(6673):eadh3860. doi: 10.1126/science.adh3860.

Epistasis and evolution: recent advances and an outlook for prediction.上位性与进化：最新进展与预测展望。

BMC Biol. 2023 May 24;21(1):120. doi: 10.1186/s12915-023-01585-3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用……对复杂基因型-表型图谱进行推断和可视化

Inference and visualization of complex genotype-phenotype maps with .

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献