Crawl Date in Google's Cache: Matt Cutts Video Transcript



Ok everybody we have a new illustration today. Vanessa Fox of Google webmaster central blog talked about this some people like to learn visually , some people like to learn screen shots, so I thought ill make a little movie so this is going to be a multi media presentation the 2 media we are buying today are skill and peanut butter red ones. So lets talk about Googlebot and how it crawls the web. First off what are the red imminent represent, well everyone knows red is bad so these are going to be 404s. The Googlebot is crawling around the web and it sees a 404 sucks it down and then later on it will come back to try to check it again.



So what are the purples mean well everybody knows purple means a http status code of 200 OK, That's the only thing that it could possibly represent. So in other words Googlebot comes along and it sucks up the page and we got the page just fine. So we got a 404 we got couple http 200s so life is pretty good next, now lets talk about the cache crawl date and what they represent. So we are not able to tell that easily but this is purple we got two greens , purple and the rest greens. So what do you think the green imminent represent? Everybody knows the green imminent are great we know it's the good ones so green represent a status code of 304. So in a browser Googlebot comes to a page they say hey I want to copy this page or you can just tell me if the page has been modified since I indexed and that the page if the page has not been modified since a certain date you can get 304 status back saying that this page hasn't changed and all that Googlebot has to do is to ignore that page. SO this is what Googlebot does , this is going forward in time so in other words we crawl a page we get 200, the next 2 times Googlebot crawl the page it gets a 304 which is the If Modified Since that said that the page hasn't really changed. And later on then here the webmaster actually changed the page and we see this purple that again means the page has been changed since the last crawl and now we get a 200 since the page is actually fetched.



Now going forward the page didn't change so the web server is smart enough to return a 304 status code for each one of the visits by Googlebot. Now the thing that is interesting is if you want to check whether Googlebot cached the page it will show the last date that the page was last retrieved. But the interesting thing is that until recently the post that we checked on this date and this date it will still give us the very first time that we fetched that page. Now you fetch the page again and it would show this cache crawl date and this would continue and may be for 6 months if the page and the page hasn't change we would still show the old cache crawl date. So the change in policy in what we are doing is if we check on this date and on this date to see if the page has changed we will now show that date in the cache crawl date. So in other words as Googlebot comes along , slipping stuff along it might used to a page which might look pretty old we update that so as we know about even if the page is changed or not we update the crawl date in the cached page so the pages look more fresh in the cache crawl date even for the fact we are showing the date to reflect in the fact that we have actually recently checked the pages has changed.

Labels: , , ,


0 Comments:

Post a Comment

Links to this post:

Create a Link

<< SEO Blog Home