Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan.
Hum Genet. 2021 Aug;140(8):1201-1216. doi: 10.1007/s00439-021-02291-2. Epub 2021 May 12.
Intermediate-sized insertions are one of the structural variants contributing to genome diversity. However, due to technical difficulties in identifying them, their importance in disease pathogenicity and gene expression regulation remains unclear. We used whole-genome sequencing data of 174 Japanese samples to characterize intermediate-sized insertions using a highly-accurate insertion calling method (IMSindel software and joint-call recovery) and obtained a catalogue of 4254 insertions. We constructed an imputation panel comprising of insertions and SNVs from all samples, and conducted imputation of intermediate-sized insertions for 82 publicly-available Japanese samples. Positive Predictive Value of imputation, evaluated using Nanopore long-read sequencing data, was 97%. Subsequent eQTL analysis predicted 128 (~ 3.0%) insertions as causative for gene expression level changes. Enrichment analysis of causal insertions for genome regulatory elements showed significant associations with CTCF-binding sites, super-enhancers, and promoters. Among 17 causal insertions found in the same causal set with GWAS hits, there were insertions associated with changes in expression of cancer-related genes such as BRCA1, ZNF222, and ABCB10. Analysis of insertions sequences revealed that 461 insertions were short tandem duplications frequently found in early-replicating regions of genome. Furthermore, comparison of functional importance of intermediate-sized insertions with that of intermediate-sized deletions detected in the same sample set in our previous study showed that insertions were more frequent in genic regions, and proportion of functional candidates was smaller in insertions. Here, we characterize a high-confidence set of intermediate-sized insertions and indicate their importance in gene expression regulation. Our results emphasize the importance of considering intermediate-sized insertions in trait association studies.
中等大小的插入是导致基因组多样性的结构变异之一。然而,由于识别它们的技术困难,它们在疾病发病机制和基因表达调控中的重要性仍不清楚。我们使用 174 个日本样本的全基因组测序数据,使用高度准确的插入调用方法(IMSindel 软件和联合调用恢复)来对中等大小的插入进行特征描述,获得了 4254 个插入的目录。我们构建了一个包含所有样本插入和 SNV 的插补面板,并对 82 个公开的日本样本进行了中等大小插入的插补。使用纳米孔长读测序数据评估的插补阳性预测值为 97%。随后的 eQTL 分析预测了 128 个(~3.0%)插入是导致基因表达水平变化的原因。对基因组调控元件的因果插入进行富集分析显示,与 CTCF 结合位点、超级增强子和启动子显著相关。在与 GWAS 命中相同因果集发现的 17 个因果插入中,有与癌症相关基因如 BRCA1、ZNF222 和 ABCB10 表达变化相关的插入。对插入序列的分析表明,461 个插入是短串联重复序列,经常出现在基因组早期复制区域。此外,与我们之前在同一样本集中检测到的中等大小缺失相比,对中等大小插入的功能重要性进行分析表明,插入更频繁地发生在基因区域,并且插入的功能候选比例更小。在这里,我们描述了一组高可信度的中等大小插入,并指出了它们在基因表达调控中的重要性。我们的研究结果强调了在性状关联研究中考虑中等大小插入的重要性。