CSA：脊椎动物基因组的高通量染色体级别的组装流水线。

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.

机构信息

Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany.

College of Fisheries, Chinese Perch Research Center, Huazhong Agricultural University; Innovation Base for Chinese Perch Breeding, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, No.1 Shizishan Street, Hongshan District, 430070 Wuhan, Hubei Province, P.R. China.

出版信息

Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa034.

DOI:10.1093/gigascience/giaa034

PMID:32449778

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7247394/

Abstract

BACKGROUND

Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce.

RESULT

Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads.

CONCLUSIONS

CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.

摘要

背景

从原始数据生成高度连续的染色体级基因组，超越了仅生成重叠群水平的易于使用且快速的长读长组装生物信息学管道仍然稀缺。

结果

染色体级组装器（CSA）是一种新颖的计算效率极高的生物信息学管道，填补了这一空白。CSA 整合了支架组装（例如 Hi-C 或 10X Genomics）甚至来自分化参考基因组的信息到组装过程中。由于 CSA 自动组装染色体大小的支架，我们将其性能与最先进的参考基因组进行基准测试，即传统上使用多种单独的组装工具和手动整理以费力的方式构建。CSA 通过支架、局部重新组装和缺口闭合来增加重叠群的长度。在某些数据集上，初始重叠群 N50 可能增加高达 4.5 倍。对于较小的脊椎动物基因组，使用低成本的高端台式计算机可以在 12 小时内实现染色体级别的组装。哺乳动物基因组可以在计算服务器上 16 小时内处理。使用鱼类、鸟类和哺乳动物的分化参考基因组，我们证明 CSA 可以仅从长读长数据和基因组比较中计算染色体级别的组装。即使是分化基因组的重叠群级别草稿组装也有助于重建染色体级别的序列。CSA 还能够组装超长读取。

结论

CSA 可以加快和简化染色体级别的组装，并大大降低大规模家族级脊椎动物基因组项目的成本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2f5/7247394/8177061c0d7a/giaa034fig1.jpg

相似文献

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.CSA：脊椎动物基因组的高通量染色体级别的组装流水线。

Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa034.

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.ARKS：基于链接读取子的人类基因组草图染色体级 scaffolding。

BMC Bioinformatics. 2018 Jun 20;19(1):234. doi: 10.1186/s12859-018-2243-x.

Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies.综合元组装流程（IMAP）：一种结合多个从头组装的染色体级基因组组装器。

PLoS One. 2019 Aug 27;14(8):e0221858. doi: 10.1371/journal.pone.0221858. eCollection 2019.

Maptcha: an efficient parallel workflow for hybrid genome scaffolding.Maptcha：一种用于混合基因组支架构建的高效并行工作流程。

BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.

The revised reference genome of the leopard gecko (Eublepharis macularius) provides insight into the considerations of genome phasing and assembly.豹纹守宫（Eublepharis macularius）的修订参考基因组为基因组相位和组装的考虑提供了深入了解。

J Hered. 2023 Aug 23;114(5):513-520. doi: 10.1093/jhered/esad016.

Hi-C scaffolded short- and long-read genome assemblies of the California sea lion are broadly consistent for syntenic inference across 45 million years of evolution.高分辨率连接（Hi-C）构建的加利福尼亚海狮短读长读基因组组装结果在跨越 4500 万年进化的共线性推断方面具有广泛的一致性。

Mol Ecol Resour. 2021 Oct;21(7):2455-2470. doi: 10.1111/1755-0998.13443. Epub 2021 Jun 27.

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.

From Short Reads to Chromosome-Scale Genome Assemblies.从短读长到染色体规模的基因组组装

Methods Mol Biol. 2018;1848:151-197. doi: 10.1007/978-1-4939-8724-5_13.

ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads.ntLink：一种使用长读长进行从头基因组组装支架和映射的工具包。

Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733.

Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies.进化超级支架和染色体锚定以改进按蚊基因组组装。

BMC Biol. 2020 Jan 2;18(1):1. doi: 10.1186/s12915-019-0728-3.

引用本文的文献

Multi-genome comparisons reveal gain-and-loss evolution of anti-Mullerian hormone receptor type 2 as a candidate master sex-determining gene in Percidae.多基因组比较揭示抗缪勒管激素受体 2 的获得和缺失进化是 Percidae 中候选的主要性别决定基因。

BMC Biol. 2024 Jun 26;22(1):141. doi: 10.1186/s12915-024-01935-9.

Genome divergence and reproductive incompatibility among populations of Ganaspis near brasiliensis.近巴西棘鱼属 Ganaspis 种群间的基因组分歧和生殖不相容性。

G3 (Bethesda). 2024 Jul 8;14(7). doi: 10.1093/g3journal/jkae090.

Equilibrated evolution of the mixed auto-/allopolyploid haplotype-resolved genome of the invasive hexaploid Prussian carp.混合自交/异源多倍体单倍型解析基因组的平衡进化，入侵六倍体鲫鱼。

Nat Commun. 2022 Jul 14;13(1):4092. doi: 10.1038/s41467-022-31515-w.

Assemblies of the genomes of parasitic wasps using meta-assembly and scaffolding with genetic linkage.利用遗传连锁的元组装和支架对寄生蜂基因组进行组装。

G3 (Bethesda). 2022 Jan 4;12(1). doi: 10.1093/g3journal/jkab386.

本文引用的文献

Characterization of a Y-specific duplication/insertion of the anti-Mullerian hormone type II receptor gene based on a chromosome-scale genome assembly of yellow perch, Perca flavescens.基于黄鲈（Perca flavescens）染色体水平基因组组装，对抗苗勒氏管激素 II 型受体基因的 Y 特异性重复/插入进行特征分析。

Mol Ecol Resour. 2020 Mar;20(2):531-543. doi: 10.1111/1755-0998.13133. Epub 2020 Jan 27.

Fast and accurate long-read assembly with wtdbg2.使用 wtdbg2 实现快速准确的长读长序列组装。

Nat Methods. 2020 Feb;17(2):155-158. doi: 10.1038/s41592-019-0669-3. Epub 2019 Dec 9.

Genomic and transcriptomic insights into molecular basis of sexually dimorphic nuptial spines in Leptobrachium leishanense.基因组和转录组揭示乐昌湍蛙两性异形婚刺的分子基础。

Nat Commun. 2019 Dec 5;10(1):5551. doi: 10.1038/s41467-019-13531-5.

PLoS One. 2019 Aug 27;14(8):e0221858. doi: 10.1371/journal.pone.0221858. eCollection 2019.

Ancient animal genome architecture reflects cell type identities.古代动物基因组结构反映了细胞类型的身份。

Nat Ecol Evol. 2019 Sep;3(9):1289-1293. doi: 10.1038/s41559-019-0946-7. Epub 2019 Aug 5.

Modern technologies and algorithms for scaffolding assembled genomes.组装基因组的现代技术和算法。

PLoS Comput Biol. 2019 Jun 5;15(6):e1006994. doi: 10.1371/journal.pcbi.1006994. eCollection 2019 Jun.

Assembly of long, error-prone reads using repeat graphs.使用重复图组装长的、易错的读取。

Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.

Common workflow language (CWL)-based software pipeline for de novo genome assembly from long- and short-read data.基于通用工作流程语言 (CWL) 的从头开始组装长读长和短读数据的软件流水线。

Gigascience. 2019 Apr 1;8(4). doi: 10.1093/gigascience/giz014.

Evolution of gene regulation in ruminants differs between evolutionary breakpoint regions and homologous synteny blocks.反刍动物基因调控的进化在进化断点区域和同源同线区之间存在差异。

Genome Res. 2019 Apr;29(4):576-589. doi: 10.1101/gr.239863.118. Epub 2019 Feb 13.

A chromosome-scale assembly of the axolotl genome.蝾螈基因组的染色体级组装。

Genome Res. 2019 Feb;29(2):317-324. doi: 10.1101/gr.241901.118. Epub 2019 Jan 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CSA：脊椎动物基因组的高通量染色体级别的组装流水线。

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.

机构信息

出版信息

BACKGROUND

RESULT

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献