Lawrence S, Giles CL
Computer Science, NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, USA. E-mail:
Science. 1998 Apr 3;280(5360):98-100. doi: 10.1126/science.280.5360.98.
The coverage and recency of the major World Wide Web search engines was analyzed, yielding some surprising results. The coverage of any one engine is significantly limited: No single engine indexes more than about one-third of the "indexable Web," the coverage of the six engines investigated varies by an order of magnitude, and combining the results of the six engines yields about 3.5 times as many documents on average as compared with the results from only one engine. Analysis of the overlap between pairs of engines gives an estimated lower bound on the size of the indexable Web of 320 million pages.
对主要的万维网搜索引擎的覆盖范围和时效性进行了分析,得出了一些令人惊讶的结果。任何一个引擎的覆盖范围都有显著限制:没有一个引擎能索引超过约三分之一的“可索引网页”,所研究的六个引擎的覆盖范围相差一个数量级,并且将这六个引擎的结果相结合,平均产生的文档数量是仅使用一个引擎时的约3.5倍。对成对引擎之间重叠部分的分析给出了可索引网页大小的估计下限为3.2亿个页面。