荧光原位杂交技术：GenBank数据库中蛋白质编码DNA序列指南

FISH: a guide to protein-coding DNA sequences in the GenBank database.

作者信息

Collins D W

机构信息

Space Sciences Laboratory, Oakland, CA 94608.

出版信息

Comput Appl Biosci. 1993 Jun;9(3):337-42. doi: 10.1093/bioinformatics/9.3.337.

DOI:10.1093/bioinformatics/9.3.337

PMID:8324634

Abstract

FISH (Fast Index Search for Homologous coding sequences) consists of a database and associated software and is intended to function as a directory of protein-coding gene sequences. The FISH index contains descriptions of 22,361 DNA sequences from release 69.0 of the GenBank genetic sequence database. Complete coding sequences are represented numerically with counts of nucleotides and synonymous codons, and with GenBank LOCUS names and short descriptions. The software permits the database to be queried by GenBank LOCUS name, sequence length (expressed as total number of codons), or by comparison with a DNA sequence. In the latter case, the numerical descriptions are compared with simple distance measures in place of actual DNA sequences. The FISH package can be used to rapidly assemble lists of similar coding sequences, without regard to functional annotation or sequence alignments. Typical search times are well under a minute on widely available IBM-compatible microcomputers.

摘要

FISH（同源编码序列快速索引搜索）由一个数据库及相关软件组成，旨在作为蛋白质编码基因序列的目录。FISH索引包含来自GenBank基因序列数据库69.0版本的22361个DNA序列的描述。完整的编码序列用核苷酸和同义密码子的计数以及GenBank LOCUS名称和简短描述进行数字表示。该软件允许通过GenBank LOCUS名称、序列长度（以密码子总数表示）或与DNA序列进行比较来查询数据库。在后一种情况下，用简单的距离度量代替实际DNA序列来比较数字描述。FISH软件包可用于快速组装相似编码序列列表，而无需考虑功能注释或序列比对。在广泛使用的IBM兼容微型计算机上，典型的搜索时间远低于一分钟。