Méthodes et algorithmes pour la Bioinformatique, LIRMM, Univ, Montpellier 2, CNRS; 161 rue Ada, 34392 MONTPELLIER, France.
BMC Genomics. 2010 Jan 15;11:35. doi: 10.1186/1471-2164-11-35.
Plasmodium falciparum is the main causative agent of malaria. Of the 5 484 predicted genes of P. falciparum, about 57% do not have sufficient sequence similarity to characterized genes in other species to warrant functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes. Gene expression data have been widely used in the recent years to help functional annotation in an intra-species way via the so-called Guilt By Association (GBA) principle.
We propose a new method that uses gene expression data to assess inter-species annotation transfers. Our approach starts from a set of likely orthologs between a reference species (here S. cerevisiae and D. melanogaster) and a query species (P. falciparum). It aims at identifying clusters of coexpressed genes in the query species whose coexpression has been conserved in the reference species. These conserved clusters of coexpressed genes are then used to assess annotation transfers between genes with low sequence similarity, enabling reliable transfers of annotations from the reference to the query species. The approach was used with transcriptomic data sets of P. falciparum, S. cerevisiae and D. melanogaster, and enabled us to propose with high confidence new/refined annotations for several dozens hypothetical/putative P. falciparum genes. Notably, we revised the annotation of genes involved in ribosomal proteins and ribosome biogenesis and assembly, thus highlighting several potential drug targets.
Our approach uses both sequence similarity and gene expression data to help inter-species gene annotation transfers. Experiments show that this strategy improves the accuracy achieved when using solely sequence similarity and outperforms the accuracy of the GBA approach. In addition, our experiments with P. falciparum show that it can infer a function for numerous hypothetical genes.
疟原虫是疟疾的主要病原体。在疟原虫的 5484 个预测基因中,约 57%与其他物种的特征基因没有足够的序列相似性,无法进行功能分配。因此,需要非同源性方法来获得这些未描述基因的功能线索。近年来,基因表达数据已被广泛用于通过所谓的关联有罪(Guilt By Association,GBA)原则在种内方式帮助功能注释。
我们提出了一种利用基因表达数据评估种间注释转移的新方法。我们的方法从参考物种(此处为酿酒酵母和黑腹果蝇)和查询物种(疟原虫)之间的一组可能的直系同源基因开始。它旨在鉴定查询物种中共同表达的基因簇,这些基因的共表达在参考物种中得到了保守。然后,这些保守的共表达基因簇用于评估低序列相似性基因之间的注释转移,从而能够可靠地将注释从参考物种转移到查询物种。该方法用于疟原虫、酿酒酵母和黑腹果蝇的转录组数据集,使我们能够高度置信地为几十个假设/推定的疟原虫基因提出新的/改进的注释。值得注意的是,我们修正了涉及核糖体蛋白和核糖体生物发生和组装的基因的注释,从而突出了几个潜在的药物靶点。
我们的方法利用序列相似性和基因表达数据来帮助种间基因注释转移。实验表明,这种策略提高了仅使用序列相似性时的准确性,并优于 GBA 方法的准确性。此外,我们对疟原虫的实验表明,它可以推断出许多假设基因的功能。