Cuadrat Rafael R C, da Serra Cruz Sérgio Manuel, Tschoeke Diogo Antônio, Silva Edno, Tosta Frederico, Jucá Henrique, Jardim Rodrigo, Campos Maria Luiza M, Mattoso Marta, Dávila Alberto M R
1 Computational and Systems Biology Laboratory, Computational and Systems Biology Pole, Oswaldo Cruz Institute , Fiocruz, Brazil .
OMICS. 2014 Aug;18(8):524-38. doi: 10.1089/omi.2013.0172. Epub 2014 Jun 24.
A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
21世纪综合生物学以及针对被忽视的热带病和其他疾病的药物研发的一个关键重点,是使用基于BLAST的计算方法来识别致病生物中的直系同源组以辨别直系同源物,目的是评估物种间的异同,从而实现将注释从已知/经过整理的蛋白质转移到新的/未注释的蛋白质上。我们在此使用了一种基于轮廓的灵敏方法来识别远源同源物,并结合美国国立医学图书馆的COG(单细胞直系同源物)和KOG(真核生物直系同源物),这使我们能够对五个原生动物基因组进行比较基因组学分析。OrthoSearch被用于五个原生动物蛋白质组,结果表明,分别与COG和KOG蛋白质组比较时,可以识别出3901个和7473个直系同源物。推断出的原生动物核心蛋白质组是418个原生动物-COG直系同源组和704个原生动物-KOG直系同源组:(i)使用COG时,31.58%(132/418)属于类别J(翻译、核糖体结构和生物发生),9.81%(41/418)属于类别O(翻译后修饰、蛋白质周转、伴侣蛋白);(ii)使用KOG时,21.45%(151/704)属于类别J,13.92%(98/704)属于类别O。系统发育基因组学分析显示,真核生物有四个得到有力支持的进化枝,区分了多细胞生物((i)人类、果蝇、植物和蠕虫)和单细胞生物((ii)酵母、(iii)真菌和(iv)原生动物)物种类别。这些令人鼓舞的结果证明了基于轮廓的方法在比较基因组学中对于加速半自动重新注释的有用性,特别是对于原生动物蛋白质组。这种方法也可能适用于全球健康领域的应用,例如,在发现针对以前被认为难以用传统药物研发工具进行研究的致病生物的新型药物靶点方面。