Pathogen Molecular Biology Department, School of Hygiene and Tropical Medicine London, Keppel St, London, WC1E 7HT, UK.
Malar J. 2018 Oct 1;17(1):345. doi: 10.1186/s12936-018-2475-2.
Within Plasmodium falciparum merozoite surface protein 1 (MSP1), the N-terminal block 2 region is a highly polymorphic target of naturally acquired antibody responses. The antigenic diversity is determined by complex repeat sequences as well as non-repeat sequences, grouping into three major allelic types that appear to be maintained within populations by natural selection. Within these major types, many distinct allelic sequences have been described in different studies, but the extent and significance of the diversity remains unresolved.
To survey the diversity more extensively, block 2 allelic sequences in the msp1 gene were characterized in 2400 P. falciparum infection isolates with whole genome short read sequence data available from the Pf3K project, and compared with the data from previous studies.
Mapping the short read sequence data in the 2400 isolates to a reference library of msp1 block 2 allelic sequences yielded 3815 allele scores at the level of major allelic family types, with 46% of isolates containing two or more of these major types. Overall frequencies were similar to those previously reported in other samples with different methods, the K1-like allelic type being most common in Africa, MAD20-like most common in Southeast Asia, and RO33-like being the third most abundant type in each continent. The rare MR type, formed by recombination between MAD20-like and RO33-like alleles, was only seen in Africa and very rarely in the Indian subcontinent but not in Southeast Asia. A combination of mapped short read assembly approaches enabled 1522 complete msp1 block 2 sequences to be determined, among which there were 363 different allele sequences, of which 246 have not been described previously. In these data, the K1-like msp1 block 2 alleles are most diverse and encode 225 distinct amino acid sequences, compared with 123 different MAD20-like, 9 RO33-like and 6 MR type sequences. Within each of the major types, the different allelic sequences show highly skewed geographical distributions, with most of the more common sequences being detected in either Africa or Asia, but not in both.
Allelic sequences of this extremely polymorphic locus have been derived from whole genome short read sequence data by mapping to a reference library followed by assembly of mapped reads. The catalogue of sequence variation has been greatly expanded, so that there are now more than 500 different msp1 block 2 allelic sequences described. This provides an extensive reference for molecular epidemiological genotyping and sequencing studies, and potentially for design of a multi-allelic vaccine.
在恶性疟原虫裂殖子表面蛋白 1(MSP1)中,N 端的块 2 区是自然获得的抗体反应的高度多态性靶标。抗原多样性由复杂的重复序列和非重复序列决定,分为三种主要等位基因类型,这些类型似乎通过自然选择在人群中得到维持。在这些主要类型中,在不同的研究中已经描述了许多不同的等位基因序列,但多样性的程度和意义仍未得到解决。
为了更广泛地调查多样性,利用 Pf3K 项目中提供的全基因组短读序列数据,对 2400 个恶性疟原虫感染分离株中的 msp1 基因块 2 等位基因序列进行了特征描述,并与以前的研究数据进行了比较。
将 2400 个分离株的短读序列数据映射到 msp1 块 2 等位基因序列的参考文库中,在主要等位基因家族类型的水平上得到了 3815 个等位基因评分,其中 46%的分离株包含两种或两种以上的主要类型。总体频率与以前用不同方法在其他样本中报告的频率相似,K1 样等位基因类型在非洲最常见,MAD20 样在东南亚最常见,RO33 样在每个大陆上的第三丰富类型。由 MAD20 样和 RO33 样等位基因重组形成的罕见 MR 类型仅在非洲可见,在印度次大陆非常罕见,但在东南亚未见。通过映射短读组装方法的组合,可以确定 1522 个完整的 msp1 块 2 序列,其中有 363 个不同的等位基因序列,其中 246 个以前没有描述过。在这些数据中,K1 样 msp1 块 2 等位基因是最多样化的,编码 225 个不同的氨基酸序列,而 MAD20 样有 123 个不同的,RO33 样有 9 个,MR 型有 6 个。在每个主要类型中,不同的等位基因序列显示出高度偏向的地理分布,大多数更常见的序列在非洲或亚洲检测到,但不在两者都检测到。
通过将短读序列数据映射到参考文库,然后组装映射的读序列,从全基因组短读序列数据中获得了这个高度多态性基因座的等位基因序列。序列变异的目录已大大扩展,现在已有 500 多个不同的 msp1 块 2 等位基因序列被描述。这为分子流行病学基因分型和测序研究提供了广泛的参考,也为多等位基因疫苗的设计提供了潜在的参考。