Martins C, Reis-Cunha J L, Silva M N, Pereira E G, Pappas G J, Bartholomeu D C, Zingales B
Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Brasil.
Genet Mol Res. 2011;10(3):1589-630. doi: 10.4238/vol10-3gmr1140.
Approximately 50% of the predicted protein-coding genes of the Trypanosoma cruzi CL Brener strain are annotated as hypothetical or conserved hypothetical proteins. To further characterize these genes, we generated 1161 open-reading frame expressed sequence tags (ORESTES) from the mammalian stages of the VL10 human strain. Sequence clustering resulted in 435 clusters, consisting of 339 singletons and 96 contigs. Significant matches to the T. cruzi predicted gene database were found for ~94% contigs and ~69% singletons. These included genes encoding surface proteins, known to be intensely expressed in the parasite mammalian stages and implicated in host cell invasion and/or immune evasion mechanisms. Among 151 contigs and singletons with similarity to predicted hypothetical protein-coding genes and conserved hypothetical protein-coding genes, 83% showed no match with T. cruzi EST and/or proteome databases. These ORESTES are the first experimental evidence that the corresponding genes are in fact transcribed. Sequences with no significant match were searched against several T. cruzi and National Center for Biotechnology Information non-redundant sequence databases. The ORESTES analysis indicated that 124 predicted conserved hypothetical protein-coding genes and 27 predicted hypothetical protein-coding genes annotated in the CL Brener genome are transcribed in the VL10 mammalian stages. Six ORESTES annotated as hypothetical protein-coding genes showing no match to EST and/or proteome databases were confirmed by Northern blot in VL10. The generation of this set of ORESTES complements the T. cruzi genome annotation and suggests new stage-regulated genes encoding hypothetical proteins.
克氏锥虫CL Brener株中约50%的预测蛋白质编码基因被注释为假定蛋白或保守假定蛋白。为了进一步表征这些基因,我们从VL10人株的哺乳动物阶段生成了1161个开放阅读框表达序列标签(ORESTES)。序列聚类产生了435个簇,包括339个单序列和96个重叠群。约94%的重叠群和约69%的单序列与克氏锥虫预测基因数据库有显著匹配。这些包括编码表面蛋白的基因,已知这些表面蛋白在寄生虫的哺乳动物阶段强烈表达,并与宿主细胞入侵和/或免疫逃避机制有关。在与预测的假定蛋白编码基因和保守假定蛋白编码基因相似的151个重叠群和单序列中,83%与克氏锥虫EST和/或蛋白质组数据库不匹配。这些ORESTES是相应基因实际上被转录的首个实验证据。将无显著匹配的序列与几个克氏锥虫和美国国立生物技术信息中心的非冗余序列数据库进行比对。ORESTES分析表明,CL Brener基因组中注释的124个预测保守假定蛋白编码基因和27个预测假定蛋白编码基因在VL10的哺乳动物阶段被转录。通过Northern印迹法在VL10中证实了6个注释为假定蛋白编码基因且与EST和/或蛋白质组数据库不匹配的ORESTES。这组ORESTES的产生补充了克氏锥虫基因组注释,并提示了编码假定蛋白的新阶段调控基因。