Wang Quan, Wan Lin, Li Dayong, Zhu Lihuang, Qian Minping, Deng Minghua
Center for Theoretical Biology, Peking University, Beijing100871, PR China.
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S29. doi: 10.1186/1471-2105-10-S1-S29.
A "bidirectional gene pair" is defined as two adjacent genes which are located on opposite strands of DNA with transcription start sites (TSSs) not more than 1000 base pairs apart and the intergenic region between two TSSs is commonly designated as a putative "bidirectional promoter". Individual examples of bidirectional gene pairs have been reported for years, as well as a few genome-wide analyses have been studied in mammalian and human genomes. However, no genome-wide analysis of bidirectional genes for plants has been done. Furthermore, the exact mechanism of this gene organization is still less understood.
We conducted comprehensive analysis of bidirectional gene pairs through the whole Arabidopsis thaliana genome and identified 2471 bidirectional gene pairs. The analysis shows that bidirectional genes are often coexpressed and tend to be involved in the same biological function. Furthermore, bidirectional gene pairs associated with similar functions seem to have stronger expression correlation. We pay more attention to the regulatory analysis on the intergenic regions between bidirectional genes. Using a hierarchical stochastic language model (HSL) (which is developed by ourselves), we can identify intergenic regions enriched of regulatory elements which are essential for the initiation of transcription. Finally, we picked 27 functionally associated bidirectional gene pairs with their intergenic regions enriched of regulatory elements and hypothesized them to be regulated by bidirectional promoters, some of which have the same orthologs in ancient organisms. More than half of these bidirectional gene pairs are further supported by sharing similar functional categories as these of handful experimental verified bidirectional genes.
Bidirectional gene pairs are concluded also prevalent in plant genome. Promoter analyses of the intergenic regions between bidirectional genes could be a new way to study the bidirectional gene structure, which may provide a important clue for further analysis. Such a method could be applied to other genomes.
“双向基因对”被定义为两个相邻基因,它们位于DNA的相反链上,转录起始位点(TSS)相隔不超过1000个碱基对,两个TSS之间的基因间区域通常被指定为一个假定的“双向启动子”。多年来已有双向基因对的个别例子报道,并且在哺乳动物和人类基因组中也进行了一些全基因组分析。然而,尚未对植物的双向基因进行全基因组分析。此外,这种基因组织的确切机制仍知之甚少。
我们对整个拟南芥基因组中的双向基因对进行了全面分析,共鉴定出2471个双向基因对。分析表明,双向基因通常共表达,并且倾向于参与相同的生物学功能。此外,与相似功能相关的双向基因对似乎具有更强的表达相关性。我们更加关注对双向基因之间基因间区域的调控分析。使用我们自己开发的分层随机语言模型(HSL),我们可以识别富含对转录起始至关重要的调控元件的基因间区域。最后,我们挑选了27个功能相关的双向基因对,其基因间区域富含调控元件,并假设它们受双向启动子调控,其中一些在古老生物中有相同的直系同源物。这些双向基因对中超过一半通过与少数实验验证的双向基因共享相似的功能类别而得到进一步支持。
双向基因对在植物基因组中也很普遍。对双向基因之间基因间区域的启动子分析可能是研究双向基因结构的一种新方法,这可能为进一步分析提供重要线索。这种方法可应用于其他基因组。