Chang Pei-Chin, Peck Konan
Institute of Biomedical Science, Academia Sinica, Taipei, Taiwan 115, ROC.
Bioinformatics. 2003 Jul 22;19(11):1311-7. doi: 10.1093/bioinformatics/btg162.
Mammalian genomes are highly complex. To identify the unique sequences of each gene in a mammalian gene database containing tens of thousands of DNA sequences is a computation intensive task. With the advent of parallel genetic analysis methods such as microarrays and the availability of more and more whole genome sequences of organisms, an algorithm allowing speedy identification of the unique gene probes for functional studies of individual genes will be a very useful tool.
We have developed a fast algorithm as well as a software program based on the algorithm for identifying gene specific probes of complex organisms. The algorithm was applied to the assemblies of gene sequences and was highly efficient for large databases such as the TIGR human THC and mouse TC databases. The results were assessed with the BLAST sequence alignment software. Two probe data sets have been compiled to contain specific probes for around 100 000 putative human gene transcripts and 70 000 putative mouse gene transcripts.
The gene specific probes for the putative human and mouse genes referenced in the TIGR gene indices are available at: ftp://genestamp.ibms.sinica.edu.tw/pub/SpecificP/. The software program and the source codes are available upon request.
哺乳动物基因组高度复杂。在一个包含数以万计DNA序列的哺乳动物基因数据库中识别每个基因的独特序列是一项计算量很大的任务。随着诸如微阵列等平行遗传分析方法的出现以及越来越多生物体全基因组序列的可得性,一种能够快速识别用于单个基因功能研究的独特基因探针的算法将是一个非常有用的工具。
我们开发了一种快速算法以及基于该算法的软件程序,用于识别复杂生物体的基因特异性探针。该算法应用于基因序列组装,对于诸如TIGR人类THC和小鼠TC数据库等大型数据库非常高效。结果用BLAST序列比对软件进行评估。已编制了两个探针数据集,包含针对约100000个人类假定基因转录本和70000个小鼠假定基因转录本的特异性探针。
TIGR基因索引中引用的人类和小鼠假定基因的基因特异性探针可在以下网址获取:ftp://genestamp.ibms.sinica.edu.tw/pub/SpecificP/。软件程序和源代码可应要求提供。