Thomas Brian C, Pedersen Brent, Freeling Michael
College of Natural Resources, University of California-Berkeley, Berkeley, California 94720, USA.
Genome Res. 2006 Jul;16(7):934-46. doi: 10.1101/gr.4708406. Epub 2006 Jun 7.
Approximately 90% of Arabidopsis' unique gene content is found in syntenic blocks that were formed during the most recent whole-genome duplication. Within these blocks, 28.6% of the genes have a retained pair; the remaining genes have been lost from one of the homeologs. We create a minimized genome by condensing local duplications to one gene, removing transposons, and including only genes within blocks defined by retained pairs. We use a moving average of retained and non-retained genes to find clusters of retention and then identify the types of genes that appear in clusters at frequencies above expectations. Significant clusters of retention exist for almost all chromosomal segments. Detailed alignments show that, for 85% of the genome, one homeolog was preferentially (1.6x) targeted for fractionation. This homeolog fractionation bias suggests an epigenetic mechanism. We find that islands of retention contain "connected genes," those genes predicted-by the gene balance hypothesis-to be resistant to removal because the products they encode interact with other products in a dose-sensitive manner, creating a web of dependency. Gene families that are overrepresented in clusters include those encoding components of the proteasome/protein modification complexes, signal transduction machinery, ribosomes, and transcription factor complexes. Gene pair fractionation following polyploidy or segmental duplication leaves a genome enriched for "connected" genes. These clusters of duplicate genes may help explain the evolutionary origin of coregulated chromosomal regions and new functional modules.
拟南芥约90%的独特基因内容存在于最近一次全基因组复制过程中形成的同线基因块中。在这些基因块内,28.6%的基因有保留配对;其余基因则从其中一个同源基因中丢失。我们通过将局部重复压缩为一个基因、去除转座子,并仅纳入由保留配对定义的基因块内的基因,创建了一个最小化基因组。我们使用保留和未保留基因的移动平均值来寻找保留簇,然后识别在簇中出现频率高于预期的基因类型。几乎所有染色体片段都存在显著的保留簇。详细比对显示,对于85%的基因组,一个同源基因被优先(1.6倍)靶向进行基因分离。这种同源基因分离偏差暗示了一种表观遗传机制。我们发现保留岛包含“连接基因”,即那些根据基因平衡假说预测因编码产物以剂量敏感方式与其他产物相互作用从而形成依赖网络而难以被去除的基因。在簇中过度表达的基因家族包括那些编码蛋白酶体/蛋白质修饰复合体、信号转导机制、核糖体和转录因子复合体成分的基因家族。多倍体或片段重复后的基因对分离使基因组富含“连接”基因。这些重复基因簇可能有助于解释共调控染色体区域和新功能模块的进化起源。