Pasini Erica M, Böhme Ulrike, Rutledge Gavin G, Voorberg-Van der Wel Annemarie, Sanders Mandy, Berriman Matt, Kocken Clemens Hm, Otto Thomas Dan
Biomedical Primate Research Centre, Rijswijk, Lange Kleiweg 161, 2288GJ Rijswijk, Netherlands.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.
Wellcome Open Res. 2017 Jun 16;2:42. doi: 10.12688/wellcomeopenres.11864.1. eCollection 2017.
a non-human primate malaria parasite species, has been an important model parasite since its discovery in 1907. Similarities in the biology of to the closely related, but less tractable, human malaria parasite make it the model parasite of choice for liver biology and vaccine studies pertinent to malaria. Molecular and genome-scale studies of have relied on the current reference genome sequence, which remains highly fragmented with 1,649 unassigned scaffolds and little representation of the subtelomeres. Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated a new reference genome sequence, PcyM, sourced from an Indian rhesus monkey. We compare the newly assembled genome sequence with those of several other species, including a re-annotated assembly.
The new PcyM genome assembly is of significantly higher quality than the existing reference, comprising only 56 pieces, no gaps and an improved average gene length. Detailed manual curation has ensured a comprehensive annotation of the genome with 6,632 genes, nearly 1,000 more than previously attributed to . The new assembly also has an improved representation of the subtelomeric regions, which account for nearly 40% of the sequence. Within the subtelomeres, we identified more than 1300 interspersed repeat ( ) genes, as well as a striking expansion of 36 methyltransferase pseudogenes that originated from a single copy on chromosome 9.
The manually curated PcyM reference genome sequence is an important new resource for the malaria research community. The high quality and contiguity of the data have enabled the discovery of a novel expansion of methyltransferase in the subtelomeres, and illustrates the new comparative genomics capabilities that are being unlocked by complete reference genomes.
自从1907年被发现以来,一种非人类灵长类疟原虫物种一直是重要的模型寄生虫。与密切相关但较难处理的人类疟原虫在生物学上的相似性,使其成为与疟疾相关的肝脏生物学和疫苗研究的首选模型寄生虫。对该疟原虫的分子和基因组规模研究依赖于当前的参考基因组序列,该序列仍然高度碎片化,有1649个未分配的支架,端粒区域的代表性不足。方法:我们使用长读长序列数据(太平洋生物科学公司的单分子实时技术),组装并注释了一个来自印度恒河猴的新参考基因组序列PcyM。我们将新组装的基因组序列与其他几种该疟原虫物种的序列进行比较,包括一个重新注释的组装序列。
新的PcyM基因组组装质量明显高于现有参考序列,仅由56个片段组成,没有缺口,平均基因长度有所改善。详细的人工注释确保了对该基因组的全面注释,共有6632个基因,比之前认为的该疟原虫基因数量多出近1000个。新组装序列在端粒区域的代表性也有所改善,端粒区域占序列的近40%。在端粒区域内,我们鉴定出1300多个散布重复序列(IRS)基因,以及来自9号染色体上单个拷贝的36个甲基转移酶假基因的显著扩增。
人工注释的PcyM参考基因组序列是疟疾研究界的一项重要新资源。数据的高质量和连续性使得在端粒区域发现了甲基转移酶的新扩增,并展示了完整参考基因组所带来的新的比较基因组学能力。