Spang R, Vingron M
Deutsches Krebsforschungszentrum, Theoretische Bioinformatik, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
Bioinformatics. 2001 Apr;17(4):338-42. doi: 10.1093/bioinformatics/17.4.338.
Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments.
We have investigated noise levels in pairwise alignment based database searches. The noise levels of 38 releases of the SwissProt database, display perfect logarithmic growth with the total length of the databases. Clustering of real biological sequences reduces noise levels, but the effect is marginal.
随着数据库迅速扩展,由随机序列相似性导致的数据库搜索中的噪声增加。这些噪声问题并非数据库搜索程序的技术缺陷,而是同源性搜索理念的逻辑结果。这种效应可在模拟实验中观察到。
我们研究了基于两两比对的数据库搜索中的噪声水平。SwissProt数据库38个版本的噪声水平与数据库的总长度呈现出完美的对数增长关系。真实生物序列的聚类降低了噪声水平,但效果甚微。