利用 Hi-C 和长读长测序技术提高 Illumina 组装质量：以北非单峰驼为例。

Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary.

机构信息

Department of Integrative Biology and Evolution, Research Institute of Wildlife Ecology, Vetmeduni Vienna, Vienna, Austria.

Intelligent Systems Laboratory, University of Bristol, Bristol, UK.

出版信息

Mol Ecol Resour. 2019 Jul;19(4):1015-1026. doi: 10.1111/1755-0998.13020. Epub 2019 May 17.

DOI:10.1111/1755-0998.13020

PMID:30972949

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6618069/

Abstract

Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.

摘要

研究人员已经使用 Illumina reads 组装了数千个真核生物基因组，但传统的 mate-pair 文库无法跨越所有重复元件，导致组装结果高度碎片化。然而，包括 Hi-C 和 Dovetail Genomics Chicago 文库在内的染色体构象捕获技术，以及 Pacific Biosciences 和 Oxford Nanopore 在内的长读测序技术，都有助于跨越和解决重复区域，从而改善基因组组装。干旱地区有一种重要的家畜物种——单峰驼（Camelus dromedarius），但它没有高质量的连续参考基因组。目前虽然有基因组草图，但它们高度碎片化，需要高质量的参考基因组来了解单峰驼对沙漠环境的适应以及在驯化过程中的人工选择。单峰驼是最后一批被驯化的家畜之一，与野生和家养双峰驼一起，它们是 Camelini 部落的唯一代表，这凸显了它们的进化意义。在这里，我们描述了我们改进北非单峰驼基因组的努力。我们使用 Dovetail Genomics 的 Chicago 和 Hi-C 测序文库来确定先前组装的 contigs 的顺序，生成了几乎染色体级别的支架。利用 Pacific Biosciences 的长读序列填补了剩余的缺口，然后将支架与染色体进行比较映射。长读序列为新组装体的总长度增加了 99.32 Mbp。Dovetail Chicago 和 Hi-C 文库将最长支架的长度增加了 12 倍以上，从 9.71 Mbp 增加到 124.99 Mbp，支架 N50 增加了 50 多倍，从 1.48 Mbp 增加到 75.02 Mbp。我们证明，通过结合染色体构象捕获和长读测序，可以显著升级 Illumina 从头组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c00/6618069/7ecce1b51dd6/MEN-19-1015-g001.jpg

相似文献

Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary.利用 Hi-C 和长读长测序技术提高 Illumina 组装质量：以北非单峰驼为例。

Mol Ecol Resour. 2019 Jul;19(4):1015-1026. doi: 10.1111/1755-0998.13020. Epub 2019 May 17.

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.利用光学图谱和染色体构象捕获数据改进和校正三种植物物种长读长基因组组装的连续性

Genome Res. 2017 May;27(5):778-786. doi: 10.1101/gr.213652.116. Epub 2017 Feb 3.

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.ARKS：基于链接读取子的人类基因组草图染色体级 scaffolding。

BMC Bioinformatics. 2018 Jun 20;19(1):234. doi: 10.1186/s12859-018-2243-x.

The de novo genome assembly and annotation of a female domestic dromedary of North African origin.对一只原产于北非的雌性家养单峰骆驼进行的从头基因组组装和注释。

Mol Ecol Resour. 2016 Jan;16(1):314-24. doi: 10.1111/1755-0998.12443. Epub 2015 Jul 24.

Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish.利用来自牛津纳米孔技术和 Illumina 平台的测序reads 为北美花鳉属的四个物种构建基因组草图。

Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa067.

Combined hybridization capture and shotgun sequencing for ancient DNA analysis of extinct wild and domestic dromedary camel.结合杂交捕获和鸟枪法测序用于已灭绝野生和家养单峰骆驼的古DNA分析

Mol Ecol Resour. 2017 Mar;17(2):300-313. doi: 10.1111/1755-0998.12551. Epub 2016 Aug 1.

Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.用 Illumina 短读序列对牛津纳米孔长读序列组装的细菌病原体进行打磨，以改进基因组分析。

Genomics. 2021 May;113(3):1366-1377. doi: 10.1016/j.ygeno.2021.03.018. Epub 2021 Mar 11.

Nucleotide diversity of functionally different groups of immune response genes in Old World camels based on newly annotated and reference-guided assemblies.基于新注释和参考指导组装的旧世界骆驼不同功能免疫反应基因群体的核苷酸多样性。

BMC Genomics. 2020 Sep 3;21(1):606. doi: 10.1186/s12864-020-06990-4.

Highly accurate long reads are crucial for realizing the potential of biodiversity genomics.高质量的长读长序列对于实现生物多样性基因组学的潜力至关重要。

BMC Genomics. 2023 Mar 16;24(1):117. doi: 10.1186/s12864-023-09193-9.

Estimating the population mutation rate from a de novo assembled Bactrian camel genome and cross-species comparison with dromedary ESTs.从新组装的双峰驼基因组估计种群突变率，并与单峰驼ESTs进行跨物种比较。

J Hered. 2014 Nov-Dec;105(6):839-46. doi: 10.1093/jhered/est005. Epub 2013 Mar 1.

引用本文的文献

DNA and Histone Modifications Identify a Putative Controlling Element (CE) on the X Chromosome of .DNA和组蛋白修饰鉴定出了……X染色体上的一个假定控制元件（CE）。

Cells. 2025 Aug 12;14(16):1243. doi: 10.3390/cells14161243.

Deciphering genetic adaptations of Old World camels through comparative genomic analyses across all camelid species.通过对所有骆驼科物种进行比较基因组分析来解读旧世界骆驼的遗传适应性。

iScience. 2025 Apr 18;28(5):112477. doi: 10.1016/j.isci.2025.112477. eCollection 2025 May 16.

A comprehensive map of copy number variations in dromedary camels based on whole genome sequence data.基于全基因组序列数据的单峰驼拷贝数变异的综合图谱。

Sci Rep. 2024 Oct 26;14(1):25573. doi: 10.1038/s41598-024-77773-0.

Genomic signatures of positive selection in Awarik dromedary camels from southwestern of Saudi Arabia.沙特阿拉伯西南部阿瓦里克单峰骆驼正选择的基因组特征

Front Vet Sci. 2024 Sep 18;11:1443748. doi: 10.3389/fvets.2024.1443748. eCollection 2024.

Long-range linkage disequilibrium events on the genome of dromedary camels as a signal of epistatic and directional positive selection.单峰驼基因组上的长程连锁不平衡事件作为上位性和定向正选择的信号。

Heliyon. 2024 Jul 9;10(14):e34343. doi: 10.1016/j.heliyon.2024.e34343. eCollection 2024 Jul 30.

Whole-genome sequencing of Ganoderma boninense, the causal agent of basal stem rot disease in oil palm, via combined short- and long-read sequencing.利用组合短读长读测序对油棕基部茎腐病病原菌波氏角菌进行全基因组测序。

Sci Rep. 2024 May 8;14(1):10520. doi: 10.1038/s41598-024-60713-3.

Exploring Cereal Metagenomics: Unravelling Microbial Communities for Improved Food Security.探索谷物宏基因组学：解析微生物群落以增强粮食安全

Microorganisms. 2024 Mar 2;12(3):510. doi: 10.3390/microorganisms12030510.

The immunoglobulin A isotype of the Arabian camel () preserves the dualistic structure of unconventional single-domain and canonical heavy chains.阿拉伯骆驼 () 的免疫球蛋白 A 同种型保留了非常规单结构域和典型重链的二元结构。

Front Immunol. 2023 Dec 12;14:1289769. doi: 10.3389/fimmu.2023.1289769. eCollection 2023.

Whole-genome sequencing provides novel insights into the evolutionary history and genetic adaptation of reindeer populations in northern Eurasia.全基因组测序为了解北亚驯鹿种群的进化历史和遗传适应提供了新的视角。

Sci Rep. 2023 Dec 27;13(1):23019. doi: 10.1038/s41598-023-50253-7.

Assessing genetic diversity and defining signatures of positive selection on the genome of dromedary camels from the southeast of the Arabian Peninsula.评估阿拉伯半岛东南部单峰骆驼基因组的遗传多样性并确定正选择特征。

Front Vet Sci. 2023 Nov 30;10:1296610. doi: 10.3389/fvets.2023.1296610. eCollection 2023.

本文引用的文献

Seasonal adaptations of the hypothalamo-neurohypophyseal system of the dromedary camel.双峰驼下丘脑-神经垂体系统的季节性适应。

PLoS One. 2019 Jun 18;14(6):e0216679. doi: 10.1371/journal.pone.0216679. eCollection 2019.

Errors in long-read assemblies can critically affect protein prediction.长读长组装中的错误会严重影响蛋白质预测。

Nat Biotechnol. 2019 Feb;37(2):124-126. doi: 10.1038/s41587-018-0004-z.

D-GENIES: dot plot large genomes in an interactive, efficient and simple way.D-GENIES：以交互式、高效且简单的方式绘制大型基因组的点图。

PeerJ. 2018 Jun 4;6:e4958. doi: 10.7717/peerj.4958. eCollection 2018.

High-resolution comparative analysis of great ape genomes.高分辨率比较分析大型猿类基因组。

Science. 2018 Jun 8;360(6393). doi: 10.1126/science.aar6343.

Minimap2: pairwise alignment for nucleotide sequences.Minimap2：核苷酸序列的两两比对。

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Genome Annotation Generator: a simple tool for generating and correcting WGS annotation tables for NCBI submission.基因组注释生成器：一个用于生成和纠正 WGS 注释表以便提交给 NCBI 的简单工具。

Gigascience. 2018 Apr 1;7(4):1-5. doi: 10.1093/gigascience/giy018.

Genome-wide associations identify novel candidate loci associated with genetic susceptibility to tuberculosis in wild boar.全基因组关联分析鉴定与野猪结核病遗传易感性相关的新候选基因座。

Sci Rep. 2018 Jan 31;8(1):1980. doi: 10.1038/s41598-018-20158-x.

Construction of two whole genome radiation hybrid panels for dromedary (Camelus dromedarius): 5000 and 15000.构建两个全基因组辐射杂种面板用于单峰驼（Camelus dromedarius）：5000 和 15000。

Sci Rep. 2018 Jan 31;8(1):1982. doi: 10.1038/s41598-018-20223-5.

Hybrid genome assembly and annotation of Paenibacillus pasadenensis strain R16 reveals insights on endophytic life style and antifungal activity.帕萨迪纳芽孢杆菌R16菌株的混合基因组组装与注释揭示了其内生生活方式和抗真菌活性的相关见解。

PLoS One. 2018 Jan 19;13(1):e0189993. doi: 10.1371/journal.pone.0189993. eCollection 2018.

Sooty mangabey genome sequence provides insight into AIDS resistance in a natural SIV host.黑绢毛猴基因组序列为研究自然 SIV 宿主中的艾滋病抗性提供了线索。

Nature. 2018 Jan 3;553(7686):77-81. doi: 10.1038/nature25140.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用 Hi-C 和长读长测序技术提高 Illumina 组装质量：以北非单峰驼为例。

Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献