论文2序列：检索出版物中列出的序列。

Paper2sequences: retrieval of sequences listed in a publication.

作者信息

Mersch Henning, Fuellen Georg

机构信息

Technische Fakultät, Universität Bielefeld, Bielefeld, Germany.

出版信息

Appl Bioinformatics. 2003;2(2):113-6.

PMID:15130827

Abstract

Our web-based tool simplifies the often laborious procedure of retrieving a set of biosequences in a publication or webpage. As a front-end to the Bioperl toolkit, it accepts as an input a list of identifiers. They are specified in an ASCII table (copy-pasted from the publication's PDF or HTML page) and give rise to queries in multiple databases for the protein/nucleic acid data specified. Currently, GenBank, PIR (Protein Information Resource) and Swiss-Prot are supported. For any sequence accession code listed, the database can be specified and, if retrieval fails, automatic lookup for the same code in other databases can be requested. Sequence length information (if specified) and heuristic rules are used to drive the lookup if multiple protein coding sequences (CDS) are part of a single accession. Warnings are issued in cases of ambiguities and inconsistencies. An advanced option enables the user to format the output in whatever format they wish.

摘要

我们基于网络的工具简化了在出版物或网页中检索一组生物序列这一通常很繁琐的过程。作为Bioperl工具包的前端，它接受标识符列表作为输入。这些标识符在一个ASCII表中指定（从出版物的PDF或HTML页面复制粘贴），并针对指定的蛋白质/核酸数据在多个数据库中引发查询。目前，支持GenBank、PIR（蛋白质信息资源）和Swiss-Prot。对于列出的任何序列登录号，可以指定数据库，如果检索失败，可以请求在其他数据库中自动查找相同的登录号。如果多个蛋白质编码序列（CDS）属于单个登录号，则使用序列长度信息（如果指定）和启发式规则来驱动查找。在存在歧义与不一致的情况下会发出警告。一个高级选项使用户能够以他们希望的任何格式格式化输出。