• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

致力于完成所有脊椎动物物种的完整且无错误的基因组组装。

Towards complete and error-free genome assemblies of all vertebrate species.

机构信息

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Department of Genetics, University of Cambridge, Cambridge, UK.

出版信息

Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

DOI:10.1038/s41586-021-03451-0
PMID:33911273
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8081667/
Abstract

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species. To address this issue, the international Genome 10K (G10K) consortium has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

摘要

高质量且完整的参考基因组组装对于将基因组学应用于生物学、疾病和生物多样性保护至关重要。然而,这样的组装仅可用于少数非微生物物种。为了解决这个问题,国际基因组 10K(G10K)联盟在五年的时间里,致力于评估和开发具有成本效益的方法,以组装高度准确且近乎完整的参考基因组。在这里,我们介绍了为代表六个主要脊椎动物谱系的 16 个物种生成组装所获得的经验教训。我们证实,长读测序技术对于最大限度地提高基因组质量至关重要,而未解决的复杂重复序列和单倍型杂合性如果处理不当,则是组装错误的主要来源。我们的组装纠正了大量错误,在一些历史上最好的参考基因组中添加了缺失的序列,并揭示了生物学发现。这些发现包括鉴定出许多错误的基因复制、基因大小增加、特定于谱系的染色体重排、蝙蝠基因组中重复的独立染色体断裂点,以及蛋白质编码基因及其调控区中典型的 GC 丰富模式。我们吸取了这些经验教训,已经开始了脊椎动物基因组计划(VGP),这是一项国际努力,旨在为大约 70000 种现存的脊椎动物物种生成高质量、完整的参考基因组,并帮助开启生命科学的新时代。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/3d3c3187fdcd/41586_2021_3451_Fig17_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/35d04bc38998/41586_2021_3451_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/94001bd4e00e/41586_2021_3451_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/c8e056478349/41586_2021_3451_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/da2f844ae50f/41586_2021_3451_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/182cc238ddc8/41586_2021_3451_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/68a6aaec76cc/41586_2021_3451_Fig6_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/38d9216616d7/41586_2021_3451_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/07ec91940790/41586_2021_3451_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/c4a56ae91efe/41586_2021_3451_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/426073e743a9/41586_2021_3451_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/34eb47d02ecd/41586_2021_3451_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/7834645cc1b1/41586_2021_3451_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/0ccef129d83b/41586_2021_3451_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/89bb7818efcb/41586_2021_3451_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/8a021b66e58d/41586_2021_3451_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/e356c474015b/41586_2021_3451_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/3d3c3187fdcd/41586_2021_3451_Fig17_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/35d04bc38998/41586_2021_3451_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/94001bd4e00e/41586_2021_3451_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/c8e056478349/41586_2021_3451_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/da2f844ae50f/41586_2021_3451_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/182cc238ddc8/41586_2021_3451_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/68a6aaec76cc/41586_2021_3451_Fig6_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/38d9216616d7/41586_2021_3451_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/07ec91940790/41586_2021_3451_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/c4a56ae91efe/41586_2021_3451_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/426073e743a9/41586_2021_3451_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/34eb47d02ecd/41586_2021_3451_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/7834645cc1b1/41586_2021_3451_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/0ccef129d83b/41586_2021_3451_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/89bb7818efcb/41586_2021_3451_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/8a021b66e58d/41586_2021_3451_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/e356c474015b/41586_2021_3451_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/3d3c3187fdcd/41586_2021_3451_Fig17_ESM.jpg

相似文献

1
Towards complete and error-free genome assemblies of all vertebrate species.致力于完成所有脊椎动物物种的完整且无错误的基因组组装。
Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.
2
Complete vertebrate mitogenomes reveal widespread repeats and gene duplications.完整的脊椎动物线粒体基因组揭示了广泛的重复和基因重复。
Genome Biol. 2021 Apr 29;22(1):120. doi: 10.1186/s13059-021-02336-9.
3
False gene and chromosome losses in genome assemblies caused by GC content variation and repeats.由于 GC 含量变化和重复序列导致基因组组装中的假基因和染色体缺失。
Genome Biol. 2022 Sep 27;23(1):204. doi: 10.1186/s13059-022-02765-0.
4
Widespread false gene gains caused by duplication errors in genome assemblies.基因组组装中的重复错误导致广泛的假基因获得。
Genome Biol. 2022 Sep 27;23(1):205. doi: 10.1186/s13059-022-02764-1.
5
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.CSA:脊椎动物基因组的高通量染色体级别的组装流水线。
Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa034.
6
Bridging the Gap between Vertebrate Cytogenetics and Genomics with Single-Chromosome Sequencing (ChromSeq).利用单染色体测序(ChromSeq)技术在脊椎动物细胞遗传学与基因组学之间架起桥梁。
Genes (Basel). 2021 Jan 19;12(1):124. doi: 10.3390/genes12010124.
7
Semi-automated assembly of high-quality diploid human reference genomes.半自动组装高质量的二倍体人类参考基因组。
Nature. 2022 Nov;611(7936):519-531. doi: 10.1038/s41586-022-05325-5. Epub 2022 Oct 19.
8
Progressive Cactus is a multiple-genome aligner for the thousand-genome era.渐进仙人掌是一个适用于千基因组时代的多基因组比对工具。
Nature. 2020 Nov;587(7833):246-251. doi: 10.1038/s41586-020-2871-y. Epub 2020 Nov 11.
9
Highly accurate long reads are crucial for realizing the potential of biodiversity genomics.高质量的长读长序列对于实现生物多样性基因组学的潜力至关重要。
BMC Genomics. 2023 Mar 16;24(1):117. doi: 10.1186/s12864-023-09193-9.
10
A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues.链接读取方法在宏基因组学中的应用:从降解组织中获得更高质量的从头基因组组装。
Mol Ecol Resour. 2020 Jul;20(4):856-870. doi: 10.1111/1755-0998.13155. Epub 2020 May 11.

引用本文的文献

1
PlantCAD2: A Long-Context DNA Language Model for Cross-Species Functional Annotation in Angiosperms.植物CAD2:一种用于被子植物跨物种功能注释的长上下文DNA语言模型。
bioRxiv. 2025 Sep 1:2025.08.27.672609. doi: 10.1101/2025.08.27.672609.
2
The genome sequence of the Scarce Copper, (Linnaeus, 1758) (Lepidoptera: Lycaenidae).稀铜弄蝶(林奈,1758年)(鳞翅目:弄蝶科)的基因组序列
Wellcome Open Res. 2025 Aug 11;10:434. doi: 10.12688/wellcomeopenres.24748.1. eCollection 2025.
3
The genome sequence of the Silver-washed Fritillary, (Linnaeus, 1758) (Lepidoptera: Nymphalidae).

本文引用的文献

1
Population genomics of the critically endangered kākāpō.极度濒危的鸮鹦鹉的种群基因组学
Cell Genom. 2021 Sep 8;1(1):100002. doi: 10.1016/j.xgen.2021.100002. eCollection 2021 Oct 13.
2
Widespread false gene gains caused by duplication errors in genome assemblies.基因组组装中的重复错误导致广泛的假基因获得。
Genome Biol. 2022 Sep 27;23(1):205. doi: 10.1186/s13059-022-02764-1.
3
Universal nomenclature for oxytocin-vasotocin ligand and receptor families.催产素-加压素配体和受体家族的通用命名法。
柑橘凤蝶(林奈,1758年)(鳞翅目:凤蝶科)的基因组序列。
Wellcome Open Res. 2025 Jul 31;10:399. doi: 10.12688/wellcomeopenres.24635.1. eCollection 2025.
4
The genome sequence of the de Prunner's Ringlet, von Prunner, 1798 (Lepidoptera: Nymphalidae).德普伦纳弄蝶(德普伦纳,1798年)(鳞翅目:蛱蝶科)的基因组序列
Wellcome Open Res. 2025 Aug 11;10:425. doi: 10.12688/wellcomeopenres.24693.1. eCollection 2025.
5
The genome sequence of the Violet Copper, (Denis & Schiffermüller), 1776 (Lepidoptera: Lycaenidae).紫铜弄蝶(Denis & Schiffermüller,1776年)(鳞翅目:弄蝶科)的基因组序列
Wellcome Open Res. 2025 Aug 11;10:429. doi: 10.12688/wellcomeopenres.24699.1. eCollection 2025.
6
The reference genome of the human diploid cell line RPE-1.人类二倍体细胞系RPE-1的参考基因组。
Nat Commun. 2025 Sep 12;16(1):7751. doi: 10.1038/s41467-025-62428-z.
7
The genome sequence of the Marsh Pennywort, L. (Apiales: Araliaceae).天胡荽(伞形目:五加科)的基因组序列。
Wellcome Open Res. 2025 Jul 28;10:370. doi: 10.12688/wellcomeopenres.24582.1. eCollection 2025.
8
An Intergeneric Hybrid Between Historically Isolated Temperate and Tropical Jays Following Recent Range Expansion.近期分布范围扩张后,历史上隔离的温带和热带松鸦之间的属间杂交种。
Ecol Evol. 2025 Sep 10;15(9):e72148. doi: 10.1002/ece3.72148. eCollection 2025 Sep.
9
The genome sequence of (Scopoli, 1763) (Lepidoptera: Geometridae).(斯科普利,1763年)(鳞翅目:尺蛾科)的基因组序列。
Wellcome Open Res. 2025 Jul 30;10:392. doi: 10.12688/wellcomeopenres.24664.1. eCollection 2025.
10
The genome sequence of the Black Hairstreak, (Linnaeus, 1758) (Lepidoptera: Lycaenidae).黑纹尾蛱蝶(林奈,1758年)(鳞翅目:灰蝶科)的基因组序列
Wellcome Open Res. 2025 Jul 28;10:377. doi: 10.12688/wellcomeopenres.24619.1. eCollection 2025.
Nature. 2021 Apr;592(7856):747-755. doi: 10.1038/s41586-020-03040-7. Epub 2021 Apr 28.
4
Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C.利用 Hi-C 对长读从头基因组组装进行扩展单倍型相位分析。
Nat Commun. 2021 Apr 28;12(1):1935. doi: 10.1038/s41467-020-20536-y.
5
Complete vertebrate mitogenomes reveal widespread repeats and gene duplications.完整的脊椎动物线粒体基因组揭示了广泛的重复和基因重复。
Genome Biol. 2021 Apr 29;22(1):120. doi: 10.1186/s13059-021-02336-9.
6
The structure, function and evolution of a complete human chromosome 8.完整人类 8 号染色体的结构、功能与进化
Nature. 2021 May;593(7857):101-107. doi: 10.1038/s41586-021-03420-7. Epub 2021 Apr 7.
7
Significantly improving the quality of genome assemblies through curation.通过编辑显著提高基因组组装的质量。
Gigascience. 2021 Jan 9;10(1). doi: 10.1093/gigascience/giaa153.
8
Platypus and echidna genomes reveal mammalian biology and evolution.鸭嘴兽和针鼹基因组揭示了哺乳动物的生物学和进化。
Nature. 2021 Apr;592(7856):756-762. doi: 10.1038/s41586-020-03039-0. Epub 2021 Jan 6.
9
Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies.Merqury:基因组组装的无参考质量、完整性和相位评估。
Genome Biol. 2020 Sep 14;21(1):245. doi: 10.1186/s13059-020-02134-9.
10
Broad host range of SARS-CoV-2 predicted by comparative and structural analysis of ACE2 in vertebrates.通过比较和结构分析脊椎动物 ACE2 预测 SARS-CoV-2 的广泛宿主范围。
Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):22311-22322. doi: 10.1073/pnas.2010146117. Epub 2020 Aug 21.