Fondrat C, Dessen P, Le Beux P
Nucleic Acids Res. 1986 Jan 10;14(1):197-204. doi: 10.1093/nar/14.1.197.
We propose a new method for homology search of nucleic acids or proteins in databanks. All the possible subsequences of a specific length in a sequence are converted into a code and stored in an indexed file (hash-coding). This preliminary work of codifying an entire bank is rather long but it enables an immediate access to all the sequence fragments of a given type. With our method a strict homology pattern of twenty nucleotides can be found for example in the Los Alamos bank (GENBANK) in less than 2 seconds. We can also use this data storage to considerably speed up the non-strict homology search programs and to write a program to help in the selection of nucleic acid hybridization probes.
我们提出了一种在数据库中对核酸或蛋白质进行同源性搜索的新方法。序列中特定长度的所有可能子序列都被转换为代码并存储在索引文件(哈希编码)中。对整个数据库进行编码的这项前期工作相当耗时,但它能让我们立即访问给定类型的所有序列片段。例如,使用我们的方法,在不到2秒的时间内就能在洛斯阿拉莫斯数据库(GENBANK)中找到严格的20个核苷酸的同源性模式。我们还可以利用这种数据存储方式大幅加快非严格同源性搜索程序的速度,并编写一个程序来辅助选择核酸杂交探针。