New York Genome Center, New York, NY 10013, USA.
New York Genome Center, New York, NY 10013, USA.
Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
1000 基因组计划(1kGP)是最大的全基因组测序(WGS)数据资源,完全开放,供公众使用,无访问或使用限制。1kGP 的最终、第 3 阶段版本包括来自 26 个群体的 2504 个无关联样本,主要基于低覆盖 WGS。在这里,我们展示了一个高覆盖率的 1kGP 资源,其中包含 3202 个样本,使用 Illumina 测序深度为 30X。我们进行了单核苷酸变异(SNV)和短插入和缺失(INDEL)的发现,并通过机器学习模型整合多种分析方法生成了一套全面的结构变异(SVs)。与第 3 阶段相比,我们展示了变异调用的敏感性和精度的提高,尤其是在罕见的 SNVs 以及 INDELs 和跨越频率范围的 SVs 中。我们还生成了一个改进的参考插补面板,使这里发现的变异可用于关联研究。