用于识别人类和小鼠基因特定探针的快速算法的设计与评估

Design and assessment of a fast algorithm for identifying specific probes for human and mouse genes.

作者信息

Chang Pei-Chin, Peck Konan

机构信息

Institute of Biomedical Science, Academia Sinica, Taipei, Taiwan 115, ROC.

出版信息

Bioinformatics. 2003 Jul 22;19(11):1311-7. doi: 10.1093/bioinformatics/btg162.

DOI:10.1093/bioinformatics/btg162

PMID:12874041

Abstract

MOTIVATION

Mammalian genomes are highly complex. To identify the unique sequences of each gene in a mammalian gene database containing tens of thousands of DNA sequences is a computation intensive task. With the advent of parallel genetic analysis methods such as microarrays and the availability of more and more whole genome sequences of organisms, an algorithm allowing speedy identification of the unique gene probes for functional studies of individual genes will be a very useful tool.

RESULTS

We have developed a fast algorithm as well as a software program based on the algorithm for identifying gene specific probes of complex organisms. The algorithm was applied to the assemblies of gene sequences and was highly efficient for large databases such as the TIGR human THC and mouse TC databases. The results were assessed with the BLAST sequence alignment software. Two probe data sets have been compiled to contain specific probes for around 100 000 putative human gene transcripts and 70 000 putative mouse gene transcripts.

AVAILABILITY

The gene specific probes for the putative human and mouse genes referenced in the TIGR gene indices are available at: ftp://genestamp.ibms.sinica.edu.tw/pub/SpecificP/. The software program and the source codes are available upon request.

摘要

动机

哺乳动物基因组高度复杂。在一个包含数以万计DNA序列的哺乳动物基因数据库中识别每个基因的独特序列是一项计算量很大的任务。随着诸如微阵列等平行遗传分析方法的出现以及越来越多生物体全基因组序列的可得性，一种能够快速识别用于单个基因功能研究的独特基因探针的算法将是一个非常有用的工具。

结果

我们开发了一种快速算法以及基于该算法的软件程序，用于识别复杂生物体的基因特异性探针。该算法应用于基因序列组装，对于诸如TIGR人类THC和小鼠TC数据库等大型数据库非常高效。结果用BLAST序列比对软件进行评估。已编制了两个探针数据集，包含针对约100000个人类假定基因转录本和70000个小鼠假定基因转录本的特异性探针。