Suppr超能文献

一种用于大规模比较原核生物基因组学研究的从头基因组分析流程(DeNoGAP)。

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

作者信息

Thakur Shalabh, Guttman David S

机构信息

Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.

Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada.

出版信息

BMC Bioinformatics. 2016 Jun 30;17(1):260. doi: 10.1186/s12859-016-1142-2.

Abstract

BACKGROUND

Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation.

RESULTS

We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available.

CONCLUSION

DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

摘要

背景

对密切相关的原核生物物种或菌株的全基因组序列数据进行比较分析,正日益成为解决基础生物学问题和应用生物学问题的重要且可行的方法。虽然已经开发了许多出色的工具来执行此任务,但大多数工具在面对数百个基因组序列时扩展性较差,并且许多工具需要大量的人工整理。

结果

我们开发了一种从头基因组分析流程(DeNoGAP),用于对涉及数百个全基因组序列的比较基因组学项目的数据进行自动化、迭代式和高通量分析。该流程旨在使用一系列经过验证的工具和数据库进行参考辅助和从头基因预测、同源蛋白家族分配、直系同源物预测、功能注释和泛基因组分析。由于大多数现有方法依赖于预测蛋白质序列之间的成对比较,因此它们与基因组数量呈二次方比例扩展,而DeNoGAP呈线性扩展,因为同源性分配基于迭代优化的隐马尔可夫模型。这种迭代聚类策略使DeNoGAP能够使用最少的计算资源处理大量基因组。此外,该流程的模块化结构允许在有新的分析程序可用时轻松更新。

结论

DeNoGAP整合了生物信息学工具和数据库,用于对大量基因组进行比较分析。该流程提供了用于注释和分析完整基因组序列和草图基因组序列的工具和算法。该流程是在Ubuntu Linux 12.04 LTS版本上使用Perl、BioPerl和SQLite开发的。目前,该软件包附带在Ubuntu Linux上自动安装必要外部程序的脚本;然而,在安装必要的外部程序后,该流程也应与其他Linux和Unix系统兼容。DeNoGAP可在https://sourceforge.net/projects/denogap/上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d626/4929753/ebc2a6d7923c/12859_2016_1142_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验