Lewis E B, Knafels J D, Mathog D R, Celniker S E
Division of Biology, California Institute of Technology, Pasadena 91125, USA.
Proc Natl Acad Sci U S A. 1995 Aug 29;92(18):8403-7. doi: 10.1073/pnas.92.18.8403.
The bithorax complex (BX-C) of Drosophila, one of two complexes that act as master regulators of the body plan of the fly, has now been entirely sequenced and comprises approximately 315,000 bp, only 1.4% of which codes for protein. Analysis of this sequence reveals significantly overrepresented DNA motifs of unknown, as well as known, functions in the non-protein-coding portion of the sequence. The following types of motifs in that portion are analyzed: (i) concatamers of mono-, di-, and trinucleotides; (ii) tightly clustered hexanucleotides (spaced < or = 5 bases apart); (iii) direct and reverse repeats longer than 20 bp; and (iv) a number of motifs known from biochemical studies to play a role in the regulation of the BX-C. The hexanucleotide AGATAC is remarkably overrepresented and is surmised to play a role in chromosome pairing. The positions of sites of highly overrepresented motifs are plotted for those that occur at more than five sites in the sequence, when < 0.5 case is expected. Expected values are based on a third-order Markov chain, which is the optimal order for representing the BXCALL sequence.
果蝇的双胸复合体(BX-C)是调控果蝇身体结构的两个主要复合体之一,目前已完成全序列测定,约含315,000个碱基对,其中只有1.4%编码蛋白质。对该序列的分析显示,在其非编码区存在大量已知和未知功能的DNA基序。对该区域的以下几类基序进行了分析:(i)单核苷酸、二核苷酸和三核苷酸串联体;(ii)紧密成簇的六核苷酸(间隔≤5个碱基);(iii)长度超过20个碱基的正向和反向重复序列;(iv)一些在生化研究中已知对BX-C起调控作用的基序。六核苷酸AGATAC显著富集,推测其在染色体配对中发挥作用。对于在序列中出现超过5次、预期出现次数小于0.5的高度富集基序位点,绘制了其位置图。预期值基于三阶马尔可夫链,这是用于表征BXCALL序列的最佳阶数。