Key2Ann：一种通过用人类可读注释替换数据库标识符来处理序列集的工具。

Key2Ann: a tool to process sequence sets by replacing database identifiers with a human-readable annotation.

作者信息

Pürzer Andreas, Grassmann Felix, Birzer Dietmar, Merkl Rainer

机构信息

University of Applied Sciences, Department of Computer Science and Mathematics, 93025 Regensburg, Germany.

出版信息

J Integr Bioinform. 2011 Mar 4;8(1):539. doi: 10.2390/biecoll-jib-2011-153.

DOI:10.2390/biecoll-jib-2011-153

PMID:21372341

Abstract

Deducing common properties or degrees of phylogenetic relationship by analyzing a grouping or clustering of sequence sets is a frequently used technique in computational biology. If interpreted by means of visual inspection, the conclusions depend for many of these applications on meaningful names for the input data. In accordance with the aim of the analysis, the sequences should be provided with names indicating the function of the genes or gene-products, the phylogenetic position or other properties characterizing the contributing species. However, sequences extracted from databases are most often annotated with identifiers which only implicitly contain the desired information. To solve this problem, we have designed and implemented a tool named Key2Ann, which replaces in multiple fasta files the database keys with short terms indicating the taxonomic position or other features like the gene name or the EC-number. In addition, properties like habitat, growth temperature or the degree of pathogenicity can be coded for microbial species. To allow for highest flexibility, the user can control the composition of the names by means of command line parameters. Key2Ann is written in Java and can be downloaded via http://www-bioinf.uni-regensburg.de/downl/Key2Ann.zip. We demonstrate the usage of Key2Ann by discussing three typical examples of phylogenetic analysis.

摘要

通过分析序列集的分组或聚类来推断系统发育关系的共同属性或程度，是计算生物学中常用的技术。如果通过目视检查来解释，对于许多此类应用而言，结论取决于输入数据是否有有意义的名称。根据分析目的，序列应被赋予能够表明基因或基因产物功能、系统发育位置或表征相关物种的其他属性的名称。然而，从数据库中提取的序列通常用标识符进行注释，这些标识符仅隐含地包含所需信息。为了解决这个问题，我们设计并实现了一个名为Key2Ann的工具，它在多个fasta文件中用表示分类位置或其他特征（如基因名称或酶委员会编号）的简短术语替换数据库键。此外，对于微生物物种，可以编码诸如栖息地、生长温度或致病程度等属性。为了实现最高的灵活性，用户可以通过命令行参数控制名称的组成。Key2Ann用Java编写，可以通过http://www-bioinf.uni-regensburg.de/downl/Key2Ann.zip下载。我们通过讨论系统发育分析的三个典型例子来演示Key2Ann的用法。