Porollo Aleksey
Center for Autoimmune Genomics and Etiology, Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229, USA.
Source Code Biol Med. 2014 Sep 2;9:19. doi: 10.1186/1751-0473-9-19. eCollection 2014.
Next-generation sequencing and metagenome projects yield a large number of new genomes that need further annotations, such as identification of enzymes and metabolic pathways, or analysis of metabolic strategies of newly sequenced species in comparison to known organisms. While methods for enzyme identification are available, development of the command line tools for high-throughput comparative analysis and visualization of identified enzymes is lagging.
A set of perl scripts has been developed to perform automated data retrieval from the KEGG database using its new REST program application interface. Enrichment or depletion in metabolic pathways is evaluated using the two-tailed Fisher exact test followed by Benjamini and Hochberg correction.
Comparative analysis of a given set of enzymes with a specified reference organism includes mapping to known metabolic pathways, finding shared and unique enzymes, generating links to visualize maps at KEGG Pathway, computing enrichment of the pathways, listing the non-mapped enzymes.
EC2KEGG provides a platform independent toolkit for automated comparison of identified sets of enzymes from newly sequenced organisms against annotated reference genomes. The tool can be used both for manual annotations of individual species and for high-throughput annotations as part of a computational pipeline. The tool is publicly available at http://sourceforge.net/projects/ec2kegg/.
新一代测序和宏基因组项目产生了大量需要进一步注释的新基因组,例如酶和代谢途径的鉴定,或者与已知生物相比对新测序物种代谢策略的分析。虽然有酶鉴定的方法,但用于高通量比较分析和可视化已鉴定酶的命令行工具的开发却滞后了。
已开发出一组perl脚本,以使用KEGG数据库的新REST程序应用接口从该数据库中自动检索数据。使用双尾Fisher精确检验评估代谢途径中的富集或耗竭情况,随后进行Benjamini和Hochberg校正。
将给定的一组酶与指定的参考生物进行比较分析,包括映射到已知代谢途径、找出共享和独特的酶、生成链接以在KEGG途径中可视化图谱、计算途径的富集情况、列出未映射的酶。
EC2KEGG提供了一个独立于平台的工具包,用于自动比较新测序生物中已鉴定的酶集与注释的参考基因组。该工具可用于单个物种的手动注释,也可作为计算流程的一部分用于高通量注释。该工具可在http://sourceforge.net/projects/ec2kegg/上公开获取。