Mendez Gregory S, Delwiche Charles F, Apt Kirk E, Lippmeier J Casey
Department of Cell Biology and Molecular Genetics, University of Maryland College Park, College Park, Maryland, 20742-5815.
Maryland Agricultural Experiment Station, College Park, Maryland, 20742.
J Eukaryot Microbiol. 2015 Sep-Oct;62(5):679-87. doi: 10.1111/jeu.12230. Epub 2015 Jun 8.
Dinoflagellates are one of the last major lineages of eukaryotes for which little is known about genome structure and organization. We report here the sequence and gene structure of a clone isolated from a cosmid library which, to our knowledge, represents the largest contiguously sequenced, dinoflagellate genomic, tandem gene array. These data, combined with information from a large transcriptomic library, allowed a high level of confidence of every base pair call. This degree of confidence is not possible with PCR-based contigs. The sequence contains an intron-rich set of five highly expressed gene repeats arranged in tandem. One of the tandem repeat gene members contains an intron 26,372 bp long. This study characterizes a splice site consensus sequence for dinoflagellate introns. Two to nine base pairs around the 3' splice site are repeated by an identical two to nine base pairs around the 5' splice site. The 5' and 3' splice sites are in the same locations within each repeat so that the repeat is found only once in the mature mRNA. This identically repeated intron boundary sequence might be useful in gene modeling and annotation of genomes.
甲藻是真核生物中最后一个主要谱系之一,人们对其基因组结构和组织了解甚少。我们在此报告从一个黏粒文库中分离出的一个克隆的序列和基因结构,据我们所知,该克隆代表了最大的连续测序的甲藻基因组串联基因阵列。这些数据与来自一个大型转录组文库的信息相结合,使得对每一个碱基对的判定都具有很高的可信度。基于聚合酶链反应(PCR)的重叠群无法达到这种可信度。该序列包含一组由五个高度表达的基因重复序列串联排列而成的富含内含子的序列。其中一个串联重复基因成员包含一个长达26372 bp的内含子。本研究确定了甲藻内含子的剪接位点共有序列。3'剪接位点周围的两到九个碱基对被5'剪接位点周围相同的两到九个碱基对重复。5'和3'剪接位点在每个重复序列中的位置相同,因此该重复序列在成熟mRNA中只出现一次。这种完全重复的内含子边界序列可能有助于基因建模和基因组注释。