Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, PR China.
J Exp Zool B Mol Dev Evol. 2011 Nov 15;316(7):500-14. doi: 10.1002/jez.b.21421. Epub 2011 Jun 21.
The precise identification of the transcription start sites (TSSs) of genes in the honeybee genome will be helpful for inferring start codons and for determining promoter elements. The 5'SAGE approach provides a powerful tool for identifying TSSs in the sequenced genome. The main purpose of this study is to identify the actual TSSs of expressed genes as well as the usage of different TSSs in the Apis mellifera genome. We performed a 5'LongSAGE (5'LS) analysis for the adult drone head, and the TSSs of the expressed genes were determined by mapping the 5'LS tag sequences to the honeybee genome. A total of 8,280 unique 19 bp 5'LS tag sequences were identified that corresponded to 3,655 predicted genes. Out of these tags, 4,998 tags (60.4%) were mapped to a region from -1,000 bp to +100 bp of the start codon of 2,301 reference coding sequences. Notably, we observed that 28-47% of the 3,655 honeybee genes initiated transcription from alternative TSSs. The TSS consensus pattern of the honeybee genes, DT(rich) PyPu(G(rich))(T/A)(T(rich))(3), was obtained by aligning the sequences flanking the 5'LS-TSSs. We also identified three new genes in the regions downstream of 5'LS tags and validated 21 TSSs using RT-PCR amplification. Additionally, 17 genes identified by the 5'LS tags were associated with the Gene Ontology term "behavior." Mapping of the 5'LS tags on the genome not only provided direct evidence of expression for in silico predicted genes but also allowed for the identification of previously unrecognized, novel exons and alternative TSSs.
真核生物基因转录起始位点(TSSs)的精确定位,有助于推测起始密码子并确定启动子元件。5'SAGE 方法是一种在测序基因组中识别 TSS 的有力工具。本研究的主要目的是确定表达基因的实际 TSS 以及在 Apis mellifera 基因组中不同 TSS 的使用情况。我们对成年雄蜂头部进行了 5'LongSAGE(5'LS)分析,并通过将 5'LS 标签序列映射到蜜蜂基因组来确定表达基因的 TSS。共鉴定出 8280 个独特的 19bp 5'LS 标签序列,对应 3655 个预测基因。在这些标签中,有 4998 个标签(60.4%)映射到 2301 个参考编码序列起始密码子的-1000bp 到+100bp 区域。值得注意的是,我们观察到 3655 个蜜蜂基因中有 28-47%从替代 TSS 开始转录。通过比对侧翼 5'LS-TSS 的序列,获得了蜜蜂基因的 TSS 一致模式 DT(rich)PyPu(G(rich))(T/A)(T(rich))(3)。我们还在 5'LS 标签下游区域鉴定了三个新基因,并通过 RT-PCR 扩增验证了 21 个 TSS。此外,通过 5'LS 标签鉴定的 17 个基因与基因本体论术语“行为”相关。5'LS 标签在基因组上的映射不仅为计算机预测基因的表达提供了直接证据,而且还可以识别以前未被识别的新外显子和替代 TSS。