Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA.
Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.
从全基因组测序数据中不完全识别结构变异 (SV) 限制了人类遗传多样性和疾病关联的研究。在这里,我们应用了一系列长读长、短读长、链特异性测序技术、光学图谱和变异发现算法,全面分析了三个三胞胎,以单倍型解析的方式定义了人类遗传变异的全貌。我们确定了每个基因组 818,054 个插入缺失变异体(<50bp)和 27,622 个结构变异体(≥50bp)。我们还发现每个基因组中有 156 个倒位,其中 58 个倒位与反复出现的微缺失和微重复综合征的关键区域相交。总的来说,与大多数标准高通量测序研究(包括 1000 基因组计划)相比,我们的 SV 调用集代表了 SV 检测的三到七倍的增加。所提出的方法和数据集为科学界提供了一个黄金标准,使我们能够为未来的基因组测序研究提出最大限度提高结构变异敏感性的建议。