• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个高质量的亚洲基因组组装揭示了常见缺失区域的特征。

A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions.

作者信息

Kim Jina, Sung Joohon, Han Kyudong, Lee Wooseok, Mun Seyoung, Lee Jooyeon, Bahk Kunhyung, Yang Inchul, Bae Young-Kyung, Kim Changhoon, Kim Jong-Il, Seo Jeong-Sun

机构信息

Interdisciplinary Program of Bioinformatics, College of Natural Science, Seoul National University, Seoul 08826, Korea.

Genome & Health Big Data Laboratory, Department of Health Science, Seoul National University, Seoul 08826, Korea.

出版信息

Genes (Basel). 2020 Nov 13;11(11):1350. doi: 10.3390/genes11111350.

DOI:10.3390/genes11111350
PMID:33202901
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7697454/
Abstract

The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored 1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the "unmapped" (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the "unmapped reads", which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38.

摘要

当前的人类参考基因组(GRCh38),凭借其卓越的质量,对基因组分析做出了重大贡献。然而,GRCh38可能仍无法充分代表各民族基因组,尤其是亚洲人的基因组,尽管我们具体缺失的部分仍不清楚。在此,我们将GRCh38与一个韩国人(AK1)的高连续性基因组组装进行比对,以表明GRCh38中缺失了一部分AK1基因组,且这些缺失区域含有约1390个推定的编码元件。此外,当我们分析14个人(5个东亚人、4个欧洲人和5个非洲人)的“未映射”(到GRCh38) reads时,发现多个群体在缺失的基因组中共享了某些特定部分,总计约占基因组总区域的5.3 Mb(约占AK1的0.2%)。从“未映射reads”中恢复的AK1区域,即GRCh38中不存在的估计缺失区域,含有候选编码元件。我们验证了大多数常见的(≥7个人共享)缺失区域存在于人类和黑猩猩的DNA中。此外,我们进一步确定了常见缺失区域的发生机制、民族异质性以及存在情况。这项研究揭示了使用泛基因组参考的潜在优势,并提出有必要对GRCh38中全球缺失区域的各种特征进行进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abf/7697454/06a3e08d8f13/genes-11-01350-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abf/7697454/040b4715c308/genes-11-01350-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abf/7697454/2e4d97c00531/genes-11-01350-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abf/7697454/06a3e08d8f13/genes-11-01350-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abf/7697454/040b4715c308/genes-11-01350-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abf/7697454/2e4d97c00531/genes-11-01350-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abf/7697454/06a3e08d8f13/genes-11-01350-g003.jpg

相似文献

1
A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions.一个高质量的亚洲基因组组装揭示了常见缺失区域的特征。
Genes (Basel). 2020 Nov 13;11(11):1350. doi: 10.3390/genes11111350.
2
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data.两个瑞典基因组的从头组装揭示了人类GRCh38参考基因组中缺失的片段,并改进了群体规模测序数据的变异检测。
Genes (Basel). 2018 Oct 9;9(10):486. doi: 10.3390/genes9100486.
3
HUPAN: a pan-genome analysis pipeline for human genomes.HUPAN:一个用于人类基因组的泛基因组分析流水线。
Genome Biol. 2019 Jul 31;20(1):149. doi: 10.1186/s13059-019-1751-y.
4
Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity.利用家族将未映射的序列本地化,以验证端粒到端粒组装并确定新的遗传多样性热点。
Genome Res. 2023 Oct;33(10):1734-1746. doi: 10.1101/gr.277175.122. Epub 2023 Oct 25.
5
Assembly of a pan-genome from deep sequencing of 910 humans of African descent.从非洲裔 910 人的深度测序中组装泛基因组。
Nat Genet. 2019 Jan;51(1):30-35. doi: 10.1038/s41588-018-0273-y. Epub 2018 Nov 19.
6
Misassembly of long reads undermines de novo-assembled ethnicity-specific genomes: validation in a Chinese Han population.长读段的组装错误会破坏从头组装的特定族群基因组:在中国汉族人群中的验证。
Hum Genet. 2019 Jul;138(7):757-769. doi: 10.1007/s00439-019-02032-6. Epub 2019 Jun 5.
7
Assembly and annotation of an Ashkenazi human reference genome.阿什肯纳兹人参考基因组的组装和注释。
Genome Biol. 2020 Jun 2;21(1):129. doi: 10.1186/s13059-020-02047-7.
8
Long-read sequencing and de novo assembly of a Chinese genome.长读测序和中国基因组的从头组装。
Nat Commun. 2016 Jun 30;7:12065. doi: 10.1038/ncomms12065.
9
Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.GRCh38人类参考基因组对高通量测序数据分析的改进及影响
Genomics. 2017 Mar;109(2):83-90. doi: 10.1016/j.ygeno.2017.01.005. Epub 2017 Jan 26.
10
Clinical Validation of Genome Reference Consortium Human Build 38 in a Laboratory Utilizing Next-Generation Sequencing Technologies.利用下一代测序技术的实验室中人类基因组参考联盟构建 38 的临床验证。
Clin Chem. 2022 Sep 1;68(9):1177-1183. doi: 10.1093/clinchem/hvac113.

本文引用的文献

1
GenMap: ultra-fast computation of genome mappability.GenMap:快速计算基因组可映射性。
Bioinformatics. 2020 Jun 1;36(12):3687-3692. doi: 10.1093/bioinformatics/btaa222.
2
Recovery of non-reference sequences missing from the human reference genome.从人类参考基因组中缺失的非参考序列的恢复。
BMC Genomics. 2019 Oct 16;20(1):746. doi: 10.1186/s12864-019-6107-1.
3
HUPAN: a pan-genome analysis pipeline for human genomes.HUPAN:一个用于人类基因组的泛基因组分析流水线。
Genome Biol. 2019 Jul 31;20(1):149. doi: 10.1186/s13059-019-1751-y.
4
Characterizing the Major Structural Variant Alleles of the Human Genome.人类基因组主要结构变异等位基因的特征。
Cell. 2019 Jan 24;176(3):663-675.e19. doi: 10.1016/j.cell.2018.12.019. Epub 2019 Jan 17.
5
Assembly of a pan-genome from deep sequencing of 910 humans of African descent.从非洲裔 910 人的深度测序中组装泛基因组。
Nat Genet. 2019 Jan;51(1):30-35. doi: 10.1038/s41588-018-0273-y. Epub 2018 Nov 19.
6
De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations.从头人类基因组组装揭示了不同人群中多种替代单倍型的图谱。
Nat Commun. 2018 Aug 2;9(1):3040. doi: 10.1038/s41467-018-05513-w.
7
GATK PathSeq: a customizable computational tool for the discovery and identification of microbial sequences in libraries from eukaryotic hosts.GATK PathSeq:一种可定制的计算工具,用于在真核宿主的文库中发现和鉴定微生物序列。
Bioinformatics. 2018 Dec 15;34(24):4287-4289. doi: 10.1093/bioinformatics/bty501.
8
Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.丹麦 150 个个体基因组的测序和从头组装作为一个群体参考。
Nature. 2017 Aug 3;548(7665):87-91. doi: 10.1038/nature23264. Epub 2017 Jul 26.
9
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.对GRCh38和从头单倍体基因组组装的评估证明了参考组装的持久质量。
Genome Res. 2017 May;27(5):849-864. doi: 10.1101/gr.213611.116. Epub 2017 Apr 10.
10
Diversity in non-repetitive human sequences not found in the reference genome.非重复的人类序列中的多样性,在参考基因组中未发现。
Nat Genet. 2017 Apr;49(4):588-593. doi: 10.1038/ng.3801. Epub 2017 Feb 27.