Suppr超能文献

课堂上进行基因组组装和注释的 12 个快速步骤。

Twelve quick steps for genome assembly and annotation in the classroom.

机构信息

School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia.

Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia.

出版信息

PLoS Comput Biol. 2020 Nov 12;16(11):e1008325. doi: 10.1371/journal.pcbi.1008325. eCollection 2020 Nov.

Abstract

Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.

摘要

真核生物基因组测序和从头组装曾经是资金充足的国际财团的专属领域,现在已经变得越来越实惠,因此适合单个研究小组的预算。第三代长读长 DNA 测序技术越来越多地被使用,提供了广泛的基因组工具包,这些工具包曾经只保留给少数几个精选的模式生物。由于其基因组大小大、复杂性高和染色体数量多,许多水生物种的高质量基因组组装和注释仍然存在重大挑战。实际上,由于工具通常仅在有限的上下文中有效,因此为新的基因组项目选择最合适的测序和软件平台以及注释管道可能会令人望而却步。在基因组学中,生成高质量的基因组组装/注释已成为更好地理解任何物种生物学的不可或缺的工具。在此,我们提出了 12 个步骤,通过为研究人员提供广泛适用(适用于任何物种)、可持续的、涵盖基因组组装和注释项目从开始到结束的所有方面的指导方针,帮助他们开始基因组项目。我们回顾了一些常用的方法,包括提取高质量 DNA 的实用方法以及最佳测序平台和文库制备的选择。此外,我们还讨论了一系列潜在的生物信息学管道,包括结构和功能注释(例如,转座元件和重复序列)。本文还介绍了如何为基因组项目建立广泛的社区、数据管理的重要性以及如何通过将数据提交到公共存储库并与研究社区共享来使数据和结果可发现、可访问、可互操作和可重用(FAIR)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1c4/7660529/08c6a0097616/pcbi.1008325.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验