Suppr超能文献

阿什肯纳兹人参考基因组的组装和注释。

Assembly and annotation of an Ashkenazi human reference genome.

机构信息

Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.

出版信息

Genome Biol. 2020 Jun 2;21(1):129. doi: 10.1186/s13059-020-02047-7.

Abstract

BACKGROUND

Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases.

RESULTS

Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.

CONCLUSIONS

The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.

摘要

背景

每年都有成千上万的实验和研究使用人类参考基因组作为资源。这个单一的参考基因组 GRCh38 是由少数个体创建的马赛克,代表了人类种群的一小部分。需要来自多个人类群体的参考基因组,以避免潜在的偏差。

结果

在这里,我们描述了一个阿什肯纳兹个体的基因组组装和注释,以及一个新的、特定于人群的人类参考基因组的创建。与最新版本的人类参考基因组 GRCh38 相比,这个基因组更连续、更完整,并且具有高度相似的基因内容注释。阿什肯纳兹参考基因组 Ash1 包含 2973118650 个核苷酸,而 GRCh38 包含 2937639212 个核苷酸。注释确定了 20157 个蛋白质编码基因,其中 19563 个与 GRCh38 上的对应基因相似度>99%。其余大多数基因只有微小的差异。GRCh38 中缺失了 40 个蛋白质编码基因;然而,所有这些基因都是多基因家族的成员,Ash1 中包含了其他的拷贝。11 个基因在染色体上的位置与 GRCh38 中的同源基因不同。将一个与 Ash1 无关的阿什肯纳兹个体的 DNA 序列与 Ash1 进行比对,与将同一序列与更远的 GRCh38 基因组进行比对相比,发现了大约 100 万个纯合 SNP,这说明了特定于人群的参考基因组的一个好处。

结论

Ash1 基因组被作为任何涉及阿什肯纳兹犹太个体的遗传研究的参考基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf4f/7265644/6d63d8f40f15/13059_2020_2047_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验