Papathanos Philippos Aris, Windbichler Nikolai
Department of Experimental Medicine, Section of Genomics and Genetics, University of Perugia, Perugia, Italy.
Department of Life Sciences, Imperial College London, Sir Alexander Fleming Building, South Kensington Campus, London, United Kingdom.
CRISPR J. 2018 Feb 1;1(1):88-98. doi: 10.1089/crispr.2017.0012.
CRISPR-based synthetic sex ratio distorters, which operate by shredding the X-chromosome during male meiosis, are promising tools for the area-wide control of harmful insect pest or disease vector species. X-shredders have been proposed as tools to suppress insect populations by biasing the sex ratio of the wild population toward males, thus reducing its natural reproductive potential. However, to build synthetic X-shredders based on CRISPR, the selection of gRNA targets, in the form of high-copy sequence repeats on the X chromosome of a given species, is difficult, since such repeats are not accurately resolved in genome assemblies and cannot be assigned to chromosomes with confidence. We have therefore developed the redkmer computational pipeline, designed to identify short and highly abundant sequence elements occurring uniquely on the X chromosome. Redkmer was designed to use as input minimally processed whole genome sequence data from males and females. We tested redkmer with short- and long-read whole genome sequence data of , the major vector of human malaria, in which the X-shredding paradigm was originally developed. Redkmer established long reads as chromosomal proxies with excellent correlation to the genome assembly and used them to rank X-candidate kmers for their level of X-specificity and abundance. Among these, a high-confidence set of 25-mers was identified, many belonging to previously known X-chromosome repeats of , including the ribosomal gene array and the selfish elements harbored within it. Data from a control strain, in which these repeats are shared with the Y chromosome, confirmed the elimination of these kmers during filtering. Finally, we show that redkmer output can be linked directly to gRNA selection and off-target prediction. In addition, the output of redkmer, including the prediction of chromosomal origin of single-molecule long reads and chromosome specific kmers, could also be used for the characterization of other biologically relevant sex chromosome sequences, a task that is frequently hampered by the repetitiveness of sex chromosome sequence content.
基于CRISPR的合成性别比例畸变因子,通过在雄性减数分裂过程中切割X染色体发挥作用,是区域控制有害害虫或病媒物种的有前景的工具。X染色体切割器已被提议作为一种工具,通过使野生种群的性别比例偏向雄性来抑制昆虫种群,从而降低其天然繁殖潜力。然而,要构建基于CRISPR的合成X染色体切割器,选择gRNA靶点很困难,因为其靶点形式为给定物种X染色体上的高拷贝序列重复,而这些重复在基因组组装中无法准确解析,也无法可靠地分配到染色体上。因此,我们开发了redkmer计算流程,旨在识别仅在X染色体上出现的短且高度丰富的序列元件。Redkmer设计为使用来自雄性和雌性的最少处理的全基因组序列数据作为输入。我们用人类疟疾主要传播媒介按蚊的短读长和长读长全基因组序列数据测试了redkmer,X染色体切割模式最初就是在按蚊中开发的。Redkmer将长读段确立为与基因组组装具有极佳相关性的染色体代理,并利用它们根据X特异性水平和丰度对X候选kmer进行排名。其中,确定了一组高可信度的25聚体,许多属于按蚊先前已知的X染色体重复序列,包括核糖体基因阵列及其所含的自私元件。来自对照菌株的数据表明,在过滤过程中这些重复序列与Y染色体共享,从而证实了这些kmer被消除。最后,我们表明redkmer的输出可以直接与gRNA选择和脱靶预测相关联。此外,redkmer的输出,包括单分子长读段的染色体起源预测和染色体特异性kmer,也可用于表征其他生物学相关的性染色体序列,这一任务经常因性染色体序列内容的重复性而受阻。