Department of Plant and Microbial Biology, University of California, Berkeley, USA.
Genome Biol Evol. 2013;5(4):646-60. doi: 10.1093/gbe/evt035.
Following polyploidy, duplicate genes are often deleted, and if they are not, then duplicate regulatory regions are sometimes lost. By what mechanism is this loss and what is the chance that such a loss removes function? To explore these questions, we followed individual Arabidopsis thaliana-A. thaliana conserved noncoding sequences (CNSs) into the Brassica ancestor, through a paleohexaploidy and into Brassica rapa. Thus, a single Brassicaceae CNS has six potential orthologous positions in B. rapa; a single Arabidopsis CNS has three potential homeologous positions. We reasoned that a CNS, if present on a singlet Brassica gene, would be unlikely to lose function compared with a more redundant CNS, and this is the case. Redundant CNSs go nondetectable often. Using this logic, each mechanism of CNS loss was assigned a metric of functionality. By definition, proved deletions do not function as sequence. Our results indicated that CNSs that go nondetectable by base substitution or large insertion are almost certainly still functional (redundancy does not matter much to their detectability frequency), whereas those lost by inferred deletion or indels are approximately 75% likely to be nonfunctional. Overall, an average nondetectable, once-redundant CNS more than 30 bp in length has a 72% chance of being nonfunctional, and that makes sense because 97% of them sort to a molecular mechanism with "deletion" in its description, but base substitutions do cause loss. Similarly, proved-functional G-boxes go undetectable by deletion 82% of the time. Fractionation mutagenesis is a procedure that uses polyploidy as a mutagenic agent to genetically alter RNA expression profiles, and then to construct testable hypotheses as to the function of the lost regulatory site. We show fractionation mutagenesis to be a "deletion machine" in the Brassica lineage.
多倍体化后,重复基因经常会被删除,如果没有删除,那么重复的调控区有时也会丢失。这种丢失是通过什么机制发生的,这种丢失去除功能的可能性有多大?为了探讨这些问题,我们将单个拟南芥-拟南芥保守非编码序列(CNS)追踪到芸薹属祖先,经历了一次古六倍体化,并进入了芸薹属甘蓝。因此,单个芸薹属植物的 CNS 在芸薹属甘蓝中有六个潜在的直系同源位置;单个拟南芥的 CNS 有三个潜在的同系同源位置。我们推断,如果一个 CNS 存在于单个 Brassica 基因上,与更冗余的 CNS 相比,它不太可能失去功能,事实确实如此。冗余的 CNS 经常变得无法检测。基于这一逻辑,我们为每个 CNS 丢失机制分配了一个功能度量。根据定义,已证明的缺失序列不再具有功能。我们的结果表明,通过碱基替换或大片段插入而无法检测到的 CNS 几乎肯定仍然具有功能(冗余对它们的检测频率影响不大),而通过推断缺失或插入而丢失的 CNS 大约有 75%的可能性是无功能的。总的来说,平均长度超过 30 个碱基的不可检测的、曾经冗余的 CNS 有 72%的可能性是无功能的,这是有道理的,因为其中 97%的 CNS 被归类为一种分子机制,其描述中有“缺失”,但碱基替换确实会导致缺失。同样,已证明具有功能的 G-盒有 82%的时间通过缺失而无法检测到。分数诱变是一种使用多倍体作为诱变剂来遗传改变 RNA 表达谱的程序,然后构建关于丢失的调控位点功能的可测试假设。我们证明分数诱变在芸薹属植物的谱系中是一种“缺失机制”。