Suppr超能文献

用于Entrez基因的快速解析器。

Fast parsers for Entrez Gene.

作者信息

Liu Mingyi, Grigoriev Andrei

机构信息

GPC Biotech AG Fraunhoferstrasse 20, 82152 Martinsried, Germany.

出版信息

Bioinformatics. 2005 Jul 15;21(14):3189-90. doi: 10.1093/bioinformatics/bti488. Epub 2005 May 6.

Abstract

NCBI completed the transition of its main genome annotation database from Locuslink to Entrez Gene in Spring 2005. However, to this date few parsers exist for the Entrez Gene annotation file. Owing to the widespread use of Locuslink and the popularity of Perl programming language in bioinformatics, a publicly available high performance Entrez Gene parser in Perl is urgently needed. We present four such parsers that were developed using several parsing approaches (Parse::RecDescent, Parse::Yapp, Perl-byacc and Perl 5 regular expressions) and provide the first in-depth comparison of these sophisticated Perl tools. Our fastest parser processes the entire human Entrez Gene annotation file in under 12 min on one Intel Xeon 2.4 GHz CPU and can be of help to the bioinformatics community during and after the transition from Locuslink to Entrez Gene.

摘要

美国国立医学图书馆国家生物技术信息中心(NCBI)于2005年春季完成了其主要基因组注释数据库从Locuslink到Entrez Gene的转换。然而,截至目前,针对Entrez Gene注释文件的解析器却很少。由于Locuslink的广泛使用以及Perl编程语言在生物信息学中的流行,迫切需要一个公开可用的高性能Perl语言Entrez Gene解析器。我们展示了使用几种解析方法(Parse::RecDescent、Parse::Yapp、Perl-byacc和Perl 5正则表达式)开发的四个这样的解析器,并首次对这些复杂的Perl工具进行了深入比较。我们最快的解析器在一台英特尔至强2.4 GHz CPU上,不到12分钟就能处理完整个人类Entrez Gene注释文件,并且在从Locuslink过渡到Entrez Gene的过程中及之后,能够对生物信息学社区有所帮助。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验