Lowy-Gallego Ernesto, Fairley Susan, Zheng-Bradley Xiangqun, Ruffier Magali, Clarke Laura, Flicek Paul
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
Wellcome Open Res. 2019 Dec 30;4:50. doi: 10.12688/wellcomeopenres.15126.2. eCollection 2019.
We present a set of biallelic SNVs and INDELs, from 2,548 samples spanning 26 populations from the 1000 Genomes Project, called on GRCh38. We believe this will be a useful reference resource for those using GRCh38. It represents an improvement over the "lift-overs" of the 1000 Genomes Project data that have been available to date by encompassing all of the GRCh38 primary assembly autosomes and pseudo-autosomal regions, including novel, medically relevant loci. Here, we describe how the data set was created and benchmark our call set against that produced by the final phase of the 1000 Genomes Project on GRCh37 and the lift-over of that data to GRCh38.
我们展示了一组双等位基因单核苷酸变异(SNV)和插入缺失(INDEL),这些数据来自1000基因组计划中涵盖26个群体的2548个样本,基于GRCh38进行分型。我们相信这将为使用GRCh38的人提供一个有用的参考资源。与目前可用的1000基因组计划数据的“转换”相比,它是一种改进,因为它涵盖了GRCh38主要组装常染色体和假常染色体区域,包括新的、与医学相关的基因座。在这里,我们描述了数据集是如何创建的,并将我们的分型集与1000基因组计划在GRCh37上的最终阶段产生的数据以及该数据向GRCh38的转换结果进行了基准测试。