Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan.
Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
Genome Med. 2019 Jul 24;11(1):44. doi: 10.1186/s13073-019-0656-4.
Next-generation sequencing has allowed for the identification of different genetic variations, which are known to contribute to diseases. Of these, insertions and deletions are the second most abundant type of variations in the genome, but their biological importance or disease association is not well-studied, especially for deletions of intermediate sizes.
We identified intermediate-sized deletions from whole-genome sequencing (WGS) data of Japanese samples (n = 174) with a novel deletion calling method which considered multiple samples. These deletions were used to construct a reference panel for use in imputation. Imputation was then conducted using the reference panel and data from 82 publically available Japanese samples with gene expression data. The accuracy of the deletion calling and imputation was examined with Nanopore long-read sequencing technology. We also conducted an expression quantitative trait loci (eQTL) association analysis using the deletions to infer their functional impacts on genes, before characterizing the deletions causal for gene expression level changes.
We obtained a set of polymorphic 4378 high-confidence deletions and constructed a reference panel. The deletions were successfully imputed into the Japanese samples with high accuracy (97.3%). The eQTL analysis identified 181 deletions (4.1%) suggested as causal for gene expression level changes. The causal deletion candidates were significantly enriched in promoters, super-enhancers, and transcription elongation chromatin states. Generation of deletions in a cell line with the CRISPR-Cas9 system confirmed that they were indeed causative variants for gene expression change. Furthermore, one of the deletions was observed to affect the gene expression levels of a gene it was not located in.
This paper reports an accurate deletion calling method for genotype imputation at the whole genome level and shows the importance of intermediate-sized deletions in the human population.
下一代测序技术已经能够鉴定出不同的遗传变异,这些变异已知会导致疾病。其中,插入和缺失是基因组中第二丰富的变异类型,但它们的生物学重要性或与疾病的关联尚未得到充分研究,尤其是对于中等大小的缺失。
我们使用一种新的考虑多个样本的缺失调用方法,从日本样本的全基因组测序(WGS)数据中鉴定出中等大小的缺失。这些缺失被用来构建一个参考面板,用于进行基因分型。然后,使用参考面板和来自 82 个公开的日本样本的基因表达数据进行基因分型。使用纳米孔长读测序技术检查缺失调用和基因分型的准确性。我们还使用这些缺失进行了表达数量性状基因座(eQTL)关联分析,以推断它们对基因的功能影响,然后对导致基因表达水平变化的缺失进行特征描述。
我们获得了一组多态性的 4378 个高可信度缺失,并构建了一个参考面板。这些缺失被成功地高精度(97.3%)地基因分型到日本样本中。eQTL 分析确定了 181 个缺失(4.1%)被认为是导致基因表达水平变化的原因。候选的因果缺失在启动子、超级增强子和转录延伸染色质状态中显著富集。使用 CRISPR-Cas9 系统在细胞系中产生缺失,证实了它们确实是导致基因表达变化的变异。此外,一个缺失被观察到影响了它不在的基因的表达水平。
本文报道了一种用于全基因组水平基因分型的准确缺失调用方法,并展示了中等大小缺失在人类群体中的重要性。