拟南芥基因冗余的预测模型

Predictive Models of Genetic Redundancy in Arabidopsis thaliana.

作者信息

Cusack Siobhan A, Wang Peipei, Lotreck Serena G, Moore Bethany M, Meng Fanrui, Conner Jeffrey K, Krysan Patrick J, Lehti-Shiu Melissa D, Shiu Shin-Han

机构信息

Cell and Molecular Biology Program, Michigan State University, East Lansing, MI, USA.

Department of Plant Biology, Michigan State University, East Lansing, MI, USA.

出版信息

Mol Biol Evol. 2021 Jul 29;38(8):3397-3414. doi: 10.1093/molbev/msab111.

DOI:10.1093/molbev/msab111

PMID:33871641

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8321531/

Abstract

Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.

摘要

基因冗余是指在一个基因中具有功能丧失突变的个体（单突变体）在一个或多个旁系同源基因也被敲除（双突变体/高阶突变体）之前不表现出明显表型的情况。先前的研究已经确定了冗余基因对之间的一些共同特征，但尚未建立一个整合从积累的组学和突变体表型数据中获得的各种特征的基因冗余预测模型。此外，这些特征对基因冗余的相对重要性在很大程度上仍不清楚。在此，我们基于六个特征类别建立了机器学习模型，用于预测模式植物拟南芥中的基因对是否可能冗余：功能注释、包括复制模式和机制的进化保守性、表观遗传标记、包括翻译后修饰的蛋白质特性、基因表达和基因网络特性。基于留出法测试表型数据，冗余的定义、数据转换、特征子集和使用的机器学习算法对模型性能有显著影响。预测基因对为冗余的最重要特征包括有来自近期复制事件的旁系同源基因、注释为转录因子、在胁迫条件下下调以及在胁迫条件下具有相似的表达模式。我们还探讨了错误预测的潜在原因和我们研究的局限性。这个基因冗余模型揭示了可能有助于旁系同源基因长期维持的特征，并最终将使功能信息丰富的双突变体的产生更具针对性，推动功能基因组学研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dac/8321531/e260e6e435ee/msab111f1.jpg

相似文献

Predictive Models of Genetic Redundancy in Arabidopsis thaliana.拟南芥基因冗余的预测模型

Mol Biol Evol. 2021 Jul 29;38(8):3397-3414. doi: 10.1093/molbev/msab111.

Predicting genome-wide redundancy using machine learning.使用机器学习预测全基因组冗余。

BMC Evol Biol. 2010 Nov 18;10:357. doi: 10.1186/1471-2148-10-357.

Unequal genetic redundancies in Arabidopsis--a neglected phenomenon?拟南芥中不平等的基因冗余——一种被忽视的现象？

Trends Plant Sci. 2006 Oct;11(10):492-8. doi: 10.1016/j.tplants.2006.08.005. Epub 2006 Sep 1.

Molecular population genetics of redundant floral-regulatory genes in Arabidopsis thaliana.拟南芥中冗余花调控基因的分子群体遗传学

Mol Biol Evol. 2005 Jan;22(1):91-103. doi: 10.1093/molbev/msh261. Epub 2004 Sep 15.

Tackling redundancy: genetic mechanisms underlying paralog compensation in plants.解决冗余问题：植物中基因家族成员补偿的遗传机制。

New Phytol. 2023 Nov;240(4):1381-1389. doi: 10.1111/nph.19267. Epub 2023 Sep 19.

GABI-DUPLO: a collection of double mutants to overcome genetic redundancy in Arabidopsis thaliana.GABI-DUPLO：一个用于克服拟南芥遗传冗余的双突变体集合。

Plant J. 2013 Jul;75(1):157-171. doi: 10.1111/tpj.12197. Epub 2013 May 7.

The Arabidopsis RAD51 paralogs RAD51B, RAD51D and XRCC2 play partially redundant roles in somatic DNA repair and gene regulation.拟南芥 RAD51 同源物 RAD51B、RAD51D 和 XRCC2 在体细胞 DNA 修复和基因调控中发挥部分冗余作用。

New Phytol. 2014 Jan;201(1):292-304. doi: 10.1111/nph.12498. Epub 2013 Sep 18.

Extensive divergence in alternative splicing patterns after gene and genome duplication during the evolutionary history of Arabidopsis.在拟南芥的进化历史中，基因和基因组复制后，选择性剪接模式发生了广泛的分歧。

Mol Biol Evol. 2010 Jul;27(7):1686-97. doi: 10.1093/molbev/msq054. Epub 2010 Feb 25.

Genetic redundancy of senescence-associated transcription factors in Arabidopsis.拟南芥衰老相关转录因子的遗传冗余。

J Exp Bot. 2018 Feb 12;69(4):811-823. doi: 10.1093/jxb/erx345.

Divergence of the Dof gene families in poplar, Arabidopsis, and rice suggests multiple modes of gene evolution after duplication.杨树、拟南芥和水稻中Dof基因家族的分化表明基因复制后存在多种基因进化模式。

Plant Physiol. 2006 Nov;142(3):820-30. doi: 10.1104/pp.106.083642. Epub 2006 Sep 15.

引用本文的文献

Uncovering the multi-layer cis-regulatory landscape of rice via integrative nascent RNA analysis.通过整合新生RNA分析揭示水稻的多层顺式调控景观

Genome Biol. 2025 Aug 18;26(1):250. doi: 10.1186/s13059-025-03715-2.

Evaluating sequence and structural similarity metrics for predicting shared paralog functions.评估用于预测共享旁系同源基因功能的序列和结构相似性指标。

NAR Genom Bioinform. 2025 Apr 26;7(2):lqaf051. doi: 10.1093/nargab/lqaf051. eCollection 2025 Jun.

Stress Knowledge Map: A knowledge graph resource for systems biology analysis of plant stress responses.应激知识图谱：用于植物应激反应系统生物学分析的知识图谱资源。

Plant Commun. 2024 Jun 10;5(6):100920. doi: 10.1016/j.xplc.2024.100920. Epub 2024 Apr 15.

Modeling alternative translation initiation sites in plants reveals evolutionarily conserved -regulatory codes in eukaryotes.在植物中对替代翻译起始位点进行建模揭示了真核生物中进化保守的调控密码。

Genome Res. 2024 Mar 20;34(2):272-285. doi: 10.1101/gr.278100.123.

PlantFUNCO: Integrative Functional Genomics Database Reveals Clues into Duplicates Divergence Evolution.PlantFUNCO：整合功能基因组学数据库揭示了重复序列分歧进化的线索。

Mol Biol Evol. 2024 Mar 1;41(3). doi: 10.1093/molbev/msae042.

The unequal functional redundancy of Arabidopsis and is not dependent on genetic background.拟南芥的不等功能冗余并不依赖于遗传背景。

Front Plant Sci. 2023 Nov 15;14:1239093. doi: 10.3389/fpls.2023.1239093. eCollection 2023.

Probing the physiological role of the plastid outer-envelope membrane using the oemiR plasmid collection.利用 oemiR 质粒文库探究质体外膜的生理作用。

G3 (Bethesda). 2023 Sep 30;13(10). doi: 10.1093/g3journal/jkad187.

Diversification of JAZ-MYC signaling function in immune metabolism.JAZ-MYC 信号功能在免疫代谢中的多样化。

New Phytol. 2023 Sep;239(6):2277-2291. doi: 10.1111/nph.19114. Epub 2023 Jul 4.

Combining multiplex gene editing and doubled haploid technology in maize.在玉米中结合多重基因编辑和双倍单倍体技术。

New Phytol. 2023 Aug;239(4):1521-1532. doi: 10.1111/nph.19021. Epub 2023 Jun 12.

Comparative gene retention analysis in barley, wild emmer, and bread wheat pangenome lines reveals factors affecting gene retention following gene duplication.比较大麦、野生二粒小麦和普通小麦泛基因组系的基因保留分析揭示了基因倍增后影响基因保留的因素。

BMC Biol. 2023 Feb 6;21(1):25. doi: 10.1186/s12915-022-01503-z.

本文引用的文献

Robust predictions of specialized metabolism genes through machine learning.通过机器学习对特殊代谢基因进行可靠预测。

Proc Natl Acad Sci U S A. 2019 Feb 5;116(6):2344-2353. doi: 10.1073/pnas.1817074116. Epub 2019 Jan 23.

Fitness effects of mutation: testing genetic redundancy in Arabidopsis thaliana.突变的适合度效应：对拟南芥中基因冗余性的测试

J Evol Biol. 2017 Jun;30(6):1124-1135. doi: 10.1111/jeb.13081. Epub 2017 May 2.

Expansion of the Gene Ontology knowledgebase and resources.基因本体知识库及资源的扩展。

Nucleic Acids Res. 2017 Jan 4;45(D1):D331-D338. doi: 10.1093/nar/gkw1108. Epub 2016 Nov 29.

Super-resolution ribosome profiling reveals unannotated translation events in .超分辨率核糖体谱分析揭示了……中未注释的翻译事件。（原文中“in”后面缺少具体内容）

Proc Natl Acad Sci U S A. 2016 Nov 8;113(45):E7126-E7135. doi: 10.1073/pnas.1614788113. Epub 2016 Oct 21.

Evolution of Gene Duplication in Plants.植物中基因复制的进化

Plant Physiol. 2016 Aug;171(4):2294-316. doi: 10.1104/pp.16.00523. Epub 2016 Jun 10.

The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库：迈向更可持续的未来。

Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

Characteristics of Plant Essential Genes Allow for within- and between-Species Prediction of Lethal Mutant Phenotypes.植物必需基因的特征有助于在种内和种间预测致死突变体表型。

Plant Cell. 2015 Aug;27(8):2133-47. doi: 10.1105/tpc.15.00051. Epub 2015 Aug 18.

The Arabidopsis information resource: Making and mining the "gold standard" annotated reference plant genome.拟南芥信息资源：构建和挖掘“金标准”注释参考植物基因组。

Genesis. 2015 Aug;53(8):474-85. doi: 10.1002/dvg.22877. Epub 2015 Aug 4.

The Plant Genome Integrative Explorer Resource: PlantGenIE.org.植物基因组综合探索者资源：PlantGenIE.org。

New Phytol. 2015 Dec;208(4):1149-56. doi: 10.1111/nph.13557. Epub 2015 Jul 20.

The butterfly plant arms-race escalated by gene and genome duplications.蝴蝶兰植物的军备竞赛因基因和基因组复制而升级。

Proc Natl Acad Sci U S A. 2015 Jul 7;112(27):8362-6. doi: 10.1073/pnas.1503926112. Epub 2015 Jun 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

拟南芥基因冗余的预测模型

Predictive Models of Genetic Redundancy in Arabidopsis thaliana.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献