Young Jason A, Johnson Jeffery R, Benner Chris, Yan S Frank, Chen Kaisheng, Le Roch Karine G, Zhou Yingyao, Winzeler Elizabeth A
Department of Cell Biology, ICND 202, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA.
BMC Genomics. 2008 Feb 7;9:70. doi: 10.1186/1471-2164-9-70.
With the sequence of the Plasmodium falciparum genome and several global mRNA and protein life cycle expression profiling projects now completed, elucidating the underlying networks of transcriptional control important for the progression of the parasite life cycle is highly pertinent to the development of new anti-malarials. To date, relatively little is known regarding the specific mechanisms the parasite employs to regulate gene expression at the mRNA level, with studies of the P. falciparum genome sequence having revealed few cis-regulatory elements and associated transcription factors. Although it is possible the parasite may evoke mechanisms of transcriptional control drastically different from those used by other eukaryotic organisms, the extreme AT-rich nature of P. falciparum intergenic regions (approximately 90% AT) presents significant challenges to in silico cis-regulatory element discovery.
We have developed an algorithm called Gene Enrichment Motif Searching (GEMS) that uses a hypergeometric-based scoring function and a position-weight matrix optimization routine to identify with high-confidence regulatory elements in the nucleotide-biased and repeat sequence-rich P. falciparum genome. When applied to promoter regions of genes contained within 21 co-expression gene clusters generated from P. falciparum life cycle microarray data using the semi-supervised clustering algorithm Ontology-based Pattern Identification, GEMS identified 34 putative cis-regulatory elements associated with a variety of parasite processes including sexual development, cell invasion, antigenic variation and protein biosynthesis. Among these candidates were novel motifs, as well as many of the elements for which biological experimental evidence already exists in the Plasmodium literature. To provide evidence for the biological relevance of a cell invasion-related element predicted by GEMS, reporter gene and electrophoretic mobility shift assays were conducted.
This GEMS analysis demonstrates that in silico regulatory element discovery can be successfully applied to challenging repeat-sequence-rich, base-biased genomes such as that of P. falciparum. The fact that regulatory elements were predicted from a diverse range of functional gene clusters supports the hypothesis that cis-regulatory elements play a role in the transcriptional control of many P. falciparum biological processes. The putative regulatory elements described represent promising candidates for future biological investigation into the underlying transcriptional control mechanisms of gene regulation in malaria parasites.
随着恶性疟原虫基因组测序以及多个全球范围内的mRNA和蛋白质生命周期表达谱项目的完成,阐明对寄生虫生命周期进展至关重要的转录控制潜在网络对于新型抗疟药物的开发具有高度相关性。迄今为止,关于该寄生虫在mRNA水平调控基因表达所采用的具体机制,人们了解相对较少,对恶性疟原虫基因组序列的研究仅揭示了少数顺式调控元件及相关转录因子。尽管该寄生虫可能采用与其他真核生物截然不同的转录控制机制,但恶性疟原虫基因间区域极高的AT含量(约90%)给通过计算机分析来发现顺式调控元件带来了重大挑战。
我们开发了一种名为基因富集基序搜索(GEMS)的算法,该算法使用基于超几何分布的评分函数和位置权重矩阵优化程序,在核苷酸偏向且富含重复序列的恶性疟原虫基因组中高置信度地识别调控元件。当将其应用于使用基于本体的模式识别半监督聚类算法从恶性疟原虫生命周期微阵列数据生成的21个共表达基因簇所包含基因的启动子区域时,GEMS识别出34个推定的顺式调控元件,这些元件与包括有性发育、细胞入侵、抗原变异和蛋白质生物合成在内的多种寄生虫过程相关。在这些候选元件中,既有新的基序,也有许多在疟原虫文献中已有生物学实验证据的元件。为了为GEMS预测的一个与细胞入侵相关的元件的生物学相关性提供证据,进行了报告基因和电泳迁移率变动分析。
这种GEMS分析表明,通过计算机分析来发现调控元件能够成功应用于像恶性疟原虫基因组这样具有挑战性的富含重复序列、碱基偏向的基因组。从多种功能基因簇中预测出调控元件这一事实支持了顺式调控元件在许多恶性疟原虫生物学过程的转录控制中发挥作用这一假说。所描述的推定调控元件是未来对疟原虫基因调控潜在转录控制机制进行生物学研究的有希望的候选对象。