Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA.
PLoS One. 2013;8(4):e59494. doi: 10.1371/journal.pone.0059494. Epub 2013 Apr 5.
Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.
全基因组测序研究对于全面了解人类基因组变异的巨大模式至关重要。在这里,我们报告了对 44 名无关的健康白种人成年人进行高覆盖率全基因组测序研究的结果,每个个体的测序覆盖率超过 50 倍(平均为 65.8×)。我们鉴定了大约 1100 万个单核苷酸多态性(SNP)、280 万个短插入和缺失,以及超过 50 万个块替换。我们表明,尽管之前的研究,包括 1000 基因组计划第一阶段研究,已经编目了绝大多数常见的 SNP,但许多低频和罕见的变异仍然未被发现。例如,我们发现的大约 140 万个 SNP 和 130 万个短插入缺失是 dbSNP 和 1000 基因组计划第一阶段数据集的新变体,其中大多数(约 96%)的次要等位基因频率小于 5%。平均而言,每个个体基因组携带约 330 万个 SNP 和约 492,000 个插入缺失/块替换,包括大约 179 个预测会导致基因产物功能丧失的变异。此外,每个个体基因组携带约 44 个这样的纯合状态下的功能丧失变异,这将完全“敲除”相应的基因。在所有 44 个基因组中,共有 182 个基因在至少一个个体基因组中被“敲除”,其中 46 个基因在超过 30%的样本中被“敲除”,这表明许多基因在一般人群中普遍被“敲除”。基因本体分析表明,这些常见的“敲除”基因在与抗原加工和免疫反应相关的生物学过程中富集。我们的研究结果有助于全面描述人类基因组变异,特别是对于较不常见和罕见的变异,并为未来人类变异和疾病的遗传研究提供了宝贵的资源。
Nature. 2010-10-28
Nature. 2015-10-1
J Hum Genet. 2013-7-11
Gigascience. 2017-9-1
PLoS Biol. 2007-9-4
Genome Biol. 2010-5-19
bioRxiv. 2023-8-14
Calcif Tissue Int. 2023-9
Calcif Tissue Int. 2022-12
Hum Genet. 2022-10
Genes (Basel). 2022-1-30
Nucleic Acids Res. 2021-12-16
Nature. 2012-11-1
Genome Res. 2012-9