Thursday, June 03, 2004

Google testing new spider?: Marketing Monitor - May. 27, 2004

Google testing new spider?: Marketing Monitor - May. 27, 2004: "With this new Googlebot crawler working using the 1.1 protocol, it is much faster than 'traditional' crawlers. Rather than requesting one page at a time, the new crawler can request multiple pages at one time. The new HTTP 1.1 protocol allows for multiple requests to be bundled and sent as one package. I won''t go into too many details here, but with HTTP 1.0, one request had to be made at a time, therefore a crawler could only really receive one page before requesting additional pages. With HTTP 1.1 the crawler can now request many pages which then are bundled together and sent.

And this is why it looked like the crawler wasn't following a path through the site - because it wasn't. This crawler was making multiple requests at one time. It likely had a cache of pages which another crawler had told it to get, so it was going out and retrieving those pages.

So what does this mean to you? Well, if Google continues to run these crawlers, then its index should start to refresh more often. Also, new sites should be indexed sooner. Plus, Google will be able to do the same amount of indexing with fewer crawlers (or conversely index more of the web with the same number of crawlers). "

Google
Creative Commons Licence
This work is licensed under a Creative Commons License.