Wu Jie, Sieglaff Douglas H, Gervin Joshua, Xie Xiaohui S
Department of Computer Sciences, University of California, Irvine, CA 92697, CODA Genomics, Laguna Hills, CA 92656, USA.
Bioinformatics. 2008 Sep 1;24(17):1843-9. doi: 10.1093/bioinformatics/btn348. Epub 2008 Jul 8.
Understanding gene regulation in Plasmodium, the causative agent of malaria, is an important step in deciphering its complex life cycle as well as leading to possible new targets for therapeutic applications. Very little is known about gene regulation in Plasmodium, and in particular, few regulatory elements have been identified. Such discovery has been significantly hampered by the high A-T content of some of the genomes of Plasmodium species, as well as the challenge in associating discovered regulatory elements to gene regulatory cascades due to Plasmodium's complex life cycle.
We report a new method of using comparative genomics to systematically discover motifs in Plasmodium without requiring any functional data. Different from previous methods, our method does not depend on sequence alignments, and thus is particularly suitable for highly divergent genomes. We applied our method to discovering regulatory motifs between the human parasite, P.falciparum, and its rodent-infectious relative, P.yoelii. We also tested our procedure against comparisons between P.falciparum and the primate-infectious, P.knowlesi. Our computational effort leads to an initial catalog of 38 distinct motifs, corresponding to over 16 200 sites in the Plasmodium genome. The functionality of these motifs was further supported by their defined distribution within the genome as well as a correlation with gene expression patterns. This initial map provides a systematic view of gene regulation in Plasmodium, which can be refined as additional genomes become available.
The new algorithm, named motif discovery using orthologous sequences (MDOS), is available at http://www.ics.uci.edu/ approximately xhx/project/mdos/.
了解疟原虫(疟疾的病原体)中的基因调控是解读其复杂生命周期以及寻找可能的新治疗靶点的重要一步。目前对疟原虫中的基因调控知之甚少,特别是已鉴定的调控元件很少。疟原虫某些物种基因组的高A - T含量,以及由于疟原虫复杂的生命周期而难以将发现的调控元件与基因调控级联相关联,这些都严重阻碍了此类发现。
我们报告了一种利用比较基因组学在疟原虫中系统地发现基序的新方法,无需任何功能数据。与以前的方法不同,我们的方法不依赖于序列比对,因此特别适用于高度分化的基因组。我们将我们的方法应用于发现人类寄生虫恶性疟原虫与其啮齿动物感染相关种约氏疟原虫之间的调控基序。我们还针对恶性疟原虫与灵长类感染的诺氏疟原虫之间的比较测试了我们的程序。我们的计算工作得出了一个包含38个不同基序的初始目录,对应于疟原虫基因组中的超过16200个位点。这些基序的功能通过它们在基因组中的特定分布以及与基因表达模式的相关性得到了进一步支持。这一初始图谱提供了疟原虫基因调控的系统视图,随着更多基因组数据的获得,它可以得到完善。
名为使用直系同源序列发现基序(MDOS)的新算法可在http://www.ics.uci.edu/ approximately xhx/project/mdos/获取。