Kosugi Shunichi, Kamatani Yoichiro, Harada Katsutoshi, Tomizuka Kohei, Momozawa Yukihide, Morisaki Takayuki, Terao Chikashi
Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
Cell Genom. 2023 May 18;3(6):100328. doi: 10.1016/j.xgen.2023.100328. eCollection 2023 Jun 14.
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
基因组结构变异(SV)影响多种生物的遗传和表型特征,但缺乏可靠的SV检测方法阻碍了遗传分析。我们开发了一种计算算法(MOPline),该算法包括缺失呼叫恢复、高置信度SV呼叫选择以及使用短读长全基因组测序(WGS)数据进行基因分型。利用3672个高覆盖度WGS数据集,MOPline能稳定地在每个个体中检测到约16000个SV,这比之前的大规模项目高出约1.7至3.3倍,同时展现出相当水平的统计质量指标。我们为181622名日本个体的42种疾病和60种数量性状推算出SV。对推算出的SV进行全基因组关联研究,发现了41个排名靠前或几乎排名靠前的全基因组显著SV,包括8个外显子SV,其中有5个新的关联以及富集的移动元件插入。这项研究表明,短读长WGS数据可用于识别与多种性状相关的罕见和常见SV。