Department of Microbiology, Cell and Tumor biology (MTC), Karolinska Institutet, SE-17177 Stockholm, Sweden.
BMC Genomics. 2011 Feb 18;12:119. doi: 10.1186/1471-2164-12-119.
Many parasites use multicopy protein families to avoid their host's immune system through a strategy called antigenic variation. RIFIN and STEVOR proteins are variable surface antigens uniquely found in the malaria parasites Plasmodium falciparum and P. reichenowi. Although these two protein families are different, they have more similarity to each other than to any other proteins described to date. As a result, they have been grouped together in one Pfam domain. However, a recent study has described the sub-division of the RIFIN protein family into several functionally distinct groups. These sub-groups require phylogenetic analysis to sort out, which is not practical for large-scale projects, such as the sequencing of patient isolates and meta-genomic analysis.
We have manually curated the rif and stevor gene repertoires of two Plasmodium falciparum genomes, isolates DD2 and HB3. We have identified 25% of mis-annotated and ~30 missing rif and stevor genes. Using these data sets, as well as sequences from the well curated reference genome (isolate 3D7) and field isolate data from Uniprot, we have developed a tool named RSpred. The tool, based on a set of hidden Markov models and an evaluation program, automatically identifies STEVOR and RIFIN sequences as well as the sub-groups: A-RIFIN, B-RIFIN, B1-RIFIN and B2-RIFIN. In addition to these groups, we distinguish a small subset of STEVOR proteins that we named STEVOR-like, as they either differ remarkably from typical STEVOR proteins or are too fragmented to reach a high enough score. When compared to Pfam and TIGRFAMs, RSpred proves to be a more robust and more sensitive method. We have applied RSpred to the proteomes of several P. falciparum strains, P. reichenowi, P. vivax, P. knowlesi and the rodent malaria species. All groups were found in the P. falciparum strains, and also in the P. reichenowi parasite, whereas none were predicted in the other species.
We have generated a tool for the sorting of RIFIN and STEVOR proteins, large antigenic variant protein groups, into homogeneous sub-families. Assigning functions to such protein families requires their subdivision into meaningful groups such as we have shown for the RIFIN protein family. RSpred removes the need for complicated and time consuming phylogenetic analysis methods. It will benefit both research groups sequencing whole genomes as well as others working with field isolates. RSpred is freely accessible via http://www.ifm.liu.se/bioinfo/.
许多寄生虫通过抗原变异策略使用多拷贝蛋白家族来逃避宿主的免疫系统。RIFIN 和 STEVOR 蛋白是疟原虫属寄生虫恶性疟原虫和 P. reichenowi 中特有的可变表面抗原。尽管这两种蛋白家族不同,但它们彼此之间的相似性比迄今为止描述的任何其他蛋白都要高。因此,它们被归为 Pfam 结构域中的一个。然而,最近的一项研究描述了 RIFIN 蛋白家族的细分,分为几个具有不同功能的亚群。这些亚群需要进行系统发育分析来分类,但对于大规模项目(如患者分离株的测序和元基因组分析)来说并不实用。
我们对两种恶性疟原虫基因组(DD2 和 HB3)的 rif 和 stevor 基因库进行了手动整理。我们鉴定出 25%的错注释和大约 30 个缺失的 rif 和 stevor 基因。使用这些数据集以及来自精心整理的参考基因组(分离株 3D7)和 Uniprot 中来自野外分离株的数据,我们开发了一种名为 RSpred 的工具。该工具基于一组隐马尔可夫模型和一个评估程序,可自动识别 STEVOR 和 RIFIN 序列以及亚群:A-RIFIN、B-RIFIN、B1-RIFIN 和 B2-RIFIN。除了这些亚群之外,我们还区分了一小部分 STEVOR 蛋白,我们将其命名为 STEVOR-like,因为它们要么与典型的 STEVOR 蛋白明显不同,要么过于碎片化而无法获得足够高的分数。与 Pfam 和 TIGRFAMs 相比,RSpred 证明是一种更稳健、更敏感的方法。我们将 RSpred 应用于几种恶性疟原虫株、 P. reichenowi、 P. vivax、 P. knowlesi 和啮齿动物疟原虫的蛋白质组中。所有亚群都在恶性疟原虫株中发现,也在 P. reichenowi 寄生虫中发现,而在其他物种中均未预测到。
我们生成了一种工具,用于将 RIFIN 和 STEVOR 蛋白等大型抗原变异蛋白组分类为同质亚家族。为了赋予此类蛋白家族功能,需要将其细分到有意义的亚群中,如我们对 RIFIN 蛋白家族所做的那样。RSpred 消除了对复杂和耗时的系统发育分析方法的需求。它将使正在进行全基因组测序的研究小组以及其他从事野外分离株工作的小组受益。RSpred 可通过 http://www.ifm.liu.se/bioinfo/ 免费访问。