用于原核测序项目的计算基因组学管道。

A computational genomics pipeline for prokaryotic sequencing projects.

机构信息

School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

出版信息

Bioinformatics. 2010 Aug 1;26(15):1819-26. doi: 10.1093/bioinformatics/btq284. Epub 2010 Jun 2.

DOI:10.1093/bioinformatics/btq284

PMID:20519285

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2905547/

Abstract

MOTIVATION

New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data.

RESULTS

We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes.

AVAILABILITY AND IMPLEMENTATION

The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

摘要

动机

新的测序技术加速了对原核基因组的研究，并使主要基因组测序中心之外的基因组测序操作成为常规操作。然而，对于解释测序数据所需的组合组装、基因预测、基因组注释和数据表示，还没有现成的解决方案。因此，需要投入大量资源来为基因组测序项目提供定制的信息学支持，这仍然是高通量序列数据可及性的主要障碍。

结果

我们提出了一个自包含的、自动化的、适用于原核测序项目的高通量开源基因组测序和计算基因组学管道。该管道已在佐治亚理工学院和疾病控制与预防中心用于分析脑膜炎奈瑟菌和支气管败血波氏杆菌基因组。该管道能够使用多种组装器和模式进行增强或手动辅助基于参考的组装；基因预测器组合；以及基因和基因产物的功能注释。由于管道的每个组件都在本地机器上执行，无需通过互联网访问资源，因此该管道适用于性质敏感的项目。与毒力相关特征的注释使该管道特别适用于与致病性原核生物合作的项目。

可用性和实现

该管道根据开源的 GNU 通用公共许可证获得许可，并可在佐治亚理工学院的脑膜炎基地（http://nbase.biology.gatech.edu/）获得。该管道使用 Perl、Bourne Shell 和 MySQL 组合实现，与 Linux 和其他 Unix 系统兼容。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4be3/2905547/a78536d46713/btq284f1.jpg

相似文献

A computational genomics pipeline for prokaryotic sequencing projects.用于原核测序项目的计算基因组学管道。

Bioinformatics. 2010 Aug 1;26(15):1819-26. doi: 10.1093/bioinformatics/btq284. Epub 2010 Jun 2.

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.一种用于大规模比较原核生物基因组学研究的从头基因组分析流程（DeNoGAP）。

BMC Bioinformatics. 2016 Jun 30;17(1):260. doi: 10.1186/s12859-016-1142-2.

Robust high-throughput prokaryote assembly and improvement pipeline for Illumina data.用于 Illumina 数据的稳健高通量原核生物组装和改进管道。

Microb Genom. 2016 Aug 25;2(8):e000083. doi: 10.1099/mgen.0.000083. eCollection 2016 Aug.

Neisseria Base: a comparative genomics database for Neisseria meningitidis.脑膜炎奈瑟菌基础数据库：用于脑膜炎奈瑟菌的比较基因组学数据库。

Database (Oxford). 2011 Sep 18;2011:bar035. doi: 10.1093/database/bar035. Print 2011.

A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.一个逐个基因的群体基因组学平台：108个代表性脑膜炎奈瑟菌基因组的从头组装、注释和谱系分析

BMC Genomics. 2014 Dec 18;15(1):1138. doi: 10.1186/1471-2164-15-1138.

ISEScan: automated identification of insertion sequence elements in prokaryotic genomes.ISEScan：原核生物基因组中插入序列元件的自动识别。

Bioinformatics. 2017 Nov 1;33(21):3340-3347. doi: 10.1093/bioinformatics/btx433.

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.DFAST：一个灵活的原核生物基因组注释管道，用于更快地发布基因组。

Bioinformatics. 2018 Mar 15;34(6):1037-1039. doi: 10.1093/bioinformatics/btx713.

Prokka: rapid prokaryotic genome annotation.Prokka：快速的原核生物基因组注释。

Bioinformatics. 2014 Jul 15;30(14):2068-9. doi: 10.1093/bioinformatics/btu153. Epub 2014 Mar 18.

DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.DDBJ 读注释流水线：基于云计算的高通量下一代测序数据分析流水线。

DNA Res. 2013 Aug;20(4):383-90. doi: 10.1093/dnares/dst017. Epub 2013 May 8.

引用本文的文献

Rapid identification of enteric bacteria from whole genome sequences using average nucleotide identity metrics.使用平均核苷酸同一性指标从全基因组序列中快速鉴定肠道细菌。

Front Microbiol. 2023 Dec 14;14:1225207. doi: 10.3389/fmicb.2023.1225207. eCollection 2023.

Genomic characterization of a unique Panton-Valentine leucocidin-positive community-associated methicillin-resistant lineage increasingly impacting on Australian indigenous communities.对一种独特的潘顿-瓦伦丁白细胞毒素阳性社区相关耐甲氧西林金黄色葡萄球菌谱系进行基因组特征分析，该谱系对澳大利亚土著社区的影响越来越大。

Microb Genom. 2023 Dec;9(12). doi: 10.1099/mgen.0.001172.

Accelerating bioinformatics implementation in public health.加速生物信息学在公共卫生中的应用。

Microb Genom. 2023 Jul;9(7). doi: 10.1099/mgen.0.001051.

Characterization of a Nonagglutinating Toxigenic Vibrio cholerae Isolate.一株非凝集产毒霍乱弧菌的特性研究。

Microbiol Spectr. 2023 Jun 15;11(3):e0018223. doi: 10.1128/spectrum.00182-23. Epub 2023 May 17.

Draft Genome Sequences of 20 Clostridium botulinum Type A Isolates from Foodborne Botulism Outbreaks.来自食源性肉毒中毒暴发事件的20株A型肉毒梭菌的基因组序列草图

Microbiol Resour Announc. 2023 Jan 24;12(1):e0086822. doi: 10.1128/mra.00868-22. Epub 2023 Jan 4.

Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.评估肠道病原体暴发的全基因组测序质量指标

PeerJ. 2021 Nov 25;9:e12446. doi: 10.7717/peerj.12446. eCollection 2021.

Genomic Characterization of Strains From a Cluster of Infant Botulism Type A in a Small Town in Colorado, United States.美国科罗拉多州一个小镇上一组A型婴儿肉毒中毒菌株的基因组特征分析

Front Microbiol. 2021 Jul 13;12:688240. doi: 10.3389/fmicb.2021.688240. eCollection 2021.

Antibiotic Resistance in Shiga Toxigenic Isolates from Surface Waters and Sediments in a Mixed Use Urban Agricultural Landscape.来自城市混合利用农业景观地表水和沉积物的产志贺毒素分离株中的抗生素耐药性

Antibiotics (Basel). 2021 Feb 26;10(3):237. doi: 10.3390/antibiotics10030237.

Whole-genome comparative analysis of Malaysian clinical isolates.马来西亚临床分离株的全基因组比较分析。

Microb Genom. 2021 Feb;7(2). doi: 10.1099/mgen.0.000527.

Rapid, multiplexed, whole genome and plasmid sequencing of foodborne pathogens using long-read nanopore technology.利用长读长纳米孔技术对食源性病原体进行快速、多重、全基因组和质粒测序。

Sci Rep. 2019 Nov 8;9(1):16350. doi: 10.1038/s41598-019-52424-x.

本文引用的文献

The integrated microbial genomes system: an expanding comparative analysis resource.整合微生物基因组系统：一个不断扩展的比较分析资源。

Nucleic Acids Res. 2010 Jan;38(Database issue):D382-90. doi: 10.1093/nar/gkp887. Epub 2009 Oct 28.

ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads.ALLPATHS 2：使用短配对读取准确且高度连续地组装小基因组。

Genome Biol. 2009;10(10):R103. doi: 10.1186/gb-2009-10-10-r103. Epub 2009 Oct 1.

Frameshift detection in prokaryotic genomic sequences.原核生物基因组序列中的移码检测

Int J Bioinform Res Appl. 2009;5(4):458-77. doi: 10.1504/IJBRA.2009.027519.

Reordering contigs of draft genomes using the Mauve aligner.使用 Mauve 比对工具重新排列草图基因组的顺序。

Bioinformatics. 2009 Aug 15;25(16):2071-3. doi: 10.1093/bioinformatics/btp356. Epub 2009 Jun 10.

DIYA: a bacterial annotation pipeline for any genomics lab.DIYA：适用于任何基因组学实验室的细菌注释流程。

Bioinformatics. 2009 Apr 1;25(7):962-3. doi: 10.1093/bioinformatics/btp097. Epub 2009 Mar 2.

Estimating the size of the bacterial pan-genome.估算细菌泛基因组的大小。

Trends Genet. 2009 Mar;25(3):107-10. doi: 10.1016/j.tig.2008.12.004. Epub 2009 Jan 23.

Real-time DNA sequencing from single polymerase molecules.来自单个聚合酶分子的实时DNA测序。

Science. 2009 Jan 2;323(5910):133-8. doi: 10.1126/science.1162986. Epub 2008 Nov 20.

Accurate whole human genome sequencing using reversible terminator chemistry.使用可逆终止子化学法进行准确的全人类基因组测序。

Nature. 2008 Nov 6;456(7218):53-9. doi: 10.1038/nature07517.

Aggressive assembly of pyrosequencing reads with mates.将焦磷酸测序读数与配对序列进行积极组装。

Bioinformatics. 2008 Dec 15;24(24):2818-24. doi: 10.1093/bioinformatics/btn548. Epub 2008 Oct 24.

The Universal Protein Resource (UniProt) 2009.通用蛋白质资源（UniProt）2009 版

Nucleic Acids Res. 2009 Jan;37(Database issue):D169-74. doi: 10.1093/nar/gkn664. Epub 2008 Oct 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于原核测序项目的计算基因组学管道。

A computational genomics pipeline for prokaryotic sequencing projects.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献