Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
Biosciences Eastern and Central Africa, International Livestock Research Institute, Nairobi, Kenya.
BMC Genomics. 2020 Apr 3;21(1):279. doi: 10.1186/s12864-020-6683-0.
The apicomplexan parasite Theileria parva causes a livestock disease called East coast fever (ECF), with millions of animals at risk in sub-Saharan East and Southern Africa, the geographic distribution of T. parva. Over a million bovines die each year of ECF, with a tremendous economic burden to pastoralists in endemic countries. Comprehensive, accurate parasite genome annotation can facilitate the discovery of novel chemotherapeutic targets for disease treatment, as well as elucidate the biology of the parasite. However, genome annotation remains a significant challenge because of limitations in the quality and quantity of the data being used to inform the location and function of protein-coding genes and, when RNA data are used, the underlying biological complexity of the processes involved in gene expression. Here, we apply our recently published RNAseq dataset derived from the schizont life-cycle stage of T. parva to update structural and functional gene annotations across the entire nuclear genome.
The re-annotation effort lead to evidence-supported updates in over half of all protein-coding sequence (CDS) predictions, including exon changes, gene merges and gene splitting, an increase in average CDS length of approximately 50 base pairs, and the identification of 128 new genes. Among the new genes identified were those involved in N-glycosylation, a process previously thought not to exist in this organism and a potentially new chemotherapeutic target pathway for treating ECF. Alternatively-spliced genes were identified, and antisense and multi-gene family transcription were extensively characterized.
The process of re-annotation led to novel insights into the organization and expression profiles of protein-coding sequences in this parasite, and uncovered a minimal N-glycosylation pathway that changes our current understanding of the evolution of this post-translational modification in apicomplexan parasites.
边缘无体吸虫寄生虫导致一种叫做东海岸热(ECF)的家畜疾病,在撒哈拉以南的东非和南部非洲,即 T. parva 的地理分布范围内,有数百万动物面临风险。每年有超过一百万头牛死于 ECF,这给流行地区的牧民带来了巨大的经济负担。全面、准确的寄生虫基因组注释可以促进发现新的治疗疾病的化学治疗靶点,并阐明寄生虫的生物学特性。然而,由于用于告知蛋白质编码基因位置和功能的数据的质量和数量存在限制,并且当使用 RNA 数据时,涉及基因表达的过程存在潜在的生物学复杂性,因此基因组注释仍然是一个重大挑战。在这里,我们应用我们最近发表的源自 T. parva 裂殖体生命阶段的 RNAseq 数据集,更新整个核基因组的结构和功能基因注释。
重新注释工作导致超过一半的所有蛋白质编码序列(CDS)预测都得到了有证据支持的更新,包括外显子变化、基因融合和基因分裂、平均 CDS 长度增加约 50 个碱基对,以及 128 个新基因的鉴定。在鉴定的新基因中,包括参与 N-糖基化的基因,以前认为该生物体内不存在该过程,这是治疗 ECF 的一个潜在的新化学治疗靶点途径。鉴定了选择性剪接基因,并广泛研究了反义基因和多基因家族转录。
重新注释的过程导致对该寄生虫中蛋白质编码序列的组织和表达谱有了新的认识,并揭示了一个最小的 N-糖基化途径,改变了我们对这种翻译后修饰在边缘无体吸虫寄生虫中的进化的现有理解。