Woehle Christian, Kusdian Gary, Radine Claudia, Graur Dan, Landan Giddy, Gould Sven B
Institute of Molecular Evolution, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany.
BMC Genomics. 2014 Oct 17;15(1):906. doi: 10.1186/1471-2164-15-906.
The human pathogen Trichomonas vaginalis is a parabasalian flagellate that is estimated to infect 3% of the world's population annually. With a 160 megabase genome and up to 60,000 genes residing in six chromosomes, the parasite has the largest genome among sequenced protists. Although it is thought that the genome size and unusual large coding capacity is owed to genome duplication events, the exact reason and its consequences are less well studied.
Among transcriptome data we found thousands of instances, in which reads mapped onto genomic loci not annotated as genes, some reaching up to several kilobases in length. At first sight these appear to represent long non-coding RNAs (lncRNAs), however, about half of these lncRNAs have significant sequence similarities to genomic loci annotated as protein-coding genes. This provides evidence for the transcription of hundreds of pseudogenes in the parasite. Conventional lncRNAs and pseudogenes are expressed in Trichomonas through their own transcription start sites and independently from flanking genes in Trichomonas. Expression of several representative lncRNAs was verified through reverse-transcriptase PCR in different T. vaginalis strains and case studies exclude the use of alternative start codons or stop codon suppression for the genes analysed.
Our results demonstrate that T. vaginalis expresses thousands of intergenic loci, including numerous transcribed pseudogenes. In contrast to yeast these are expressed independently from neighbouring genes. Our results furthermore illustrate the effect genome duplication events can have on the transcriptome of a protist. The parasite's genome is in a steady state of changing and we hypothesize that the numerous lncRNAs could offer a large pool for potential innovation from which novel proteins or regulatory RNA units could evolve.
人类病原体阴道毛滴虫是一种副基体鞭毛虫,据估计每年感染全球3%的人口。该寄生虫拥有1.6亿碱基对的基因组,多达6万个基因分布在6条染色体上,在已测序的原生生物中其基因组最大。尽管人们认为基因组大小和异常大的编码能力归因于基因组复制事件,但其确切原因及其后果的研究较少。
在转录组数据中,我们发现了数千个实例,其中 reads 映射到未注释为基因的基因组位点,有些长度可达数千碱基。乍一看,这些似乎代表长链非编码RNA(lncRNAs),然而,这些lncRNAs中约有一半与注释为蛋白质编码基因的基因组位点具有显著的序列相似性。这为该寄生虫中数百个假基因的转录提供了证据。传统的lncRNAs和假基因在阴道毛滴虫中通过它们自己的转录起始位点表达,并且独立于阴道毛滴虫中的侧翼基因。通过逆转录PCR在不同的阴道毛滴虫菌株中验证了几种代表性lncRNAs的表达,案例研究排除了对所分析基因使用替代起始密码子或终止密码子抑制的情况。
我们的结果表明,阴道毛滴虫表达数千个基因间位点,包括大量转录的假基因。与酵母不同,这些基因独立于邻近基因表达。我们的结果进一步说明了基因组复制事件对原生生物转录组的影响。该寄生虫的基因组处于不断变化的稳定状态,我们假设大量的lncRNAs可能为潜在的创新提供一个大的库,新的蛋白质或调节RNA单元可能从中进化而来。