Matt Cutts Discusses Webmaster Tools - mattcutts video transcript
I am up in the Kirkland office today, Up here for an outside little bit planning and they said you see why don't we throw together video and like 10 minutes or less. So we said alright lets give it a shot. So we are thinking about some of the common things you do with webmaster console or some topics webmasters want to hear about people want to go to webmaster console and check their backlinks.
They also like to know if they have any penalties there are a lot of really good stats in the webmaster console. One thing I had been hearing questions about is how do I remove my URLs from Google. So why would you want to do this well suppose you are in school and you accidentally left your Social Security Number of all your students up on the web or your store you left people's credit card numbers. Or you are running a forum and suddenly you are spammed full of porn by a Ukrainian forum spammer which happened to a friend of mine recently. So whatever the reason you want some URLs out of Google instead of getting URLs into Google. Lets look at some of the possible approaches some the different ways you can do it.
What ill do is , ill go through each of these ones and will kind of draw a Happy Face by one or two I think are specially good as far as getting the contents out of Google or perhaps abandoning them from getting into Google the first place. So the first thing a lot of people say is ok I just don't like to a page is a secret server page Google won't know or ever find it that way I don't have to find a way up of showing up the Search Engines. This is not a great up roach and ill give you a very simple reason why. We actually see so many people surf to a page and then serve to an other web server and that causes your browser to create a referrer in the HTTP in browser codes the header status which showed up before will show up on the other web server. And that other web server shows hey these are the top referrers to my page and may be that's a clickable hyperlink then Google can crawl that other web server and find a link to your so called secret web page. So its very weak to say "you know what I don't want to link to it ill just keep it a secret and no one will ever find about it". For what ever reason somebody will call from that page somebody will link from that page, somebody will refer from that page or as I said somebody will accidentally link to that page and that's you know if there is a link on web to that page there is a reasonable reason that we might find it so I don't recommend anyone using that its relatively very weak way. An other way you can do is something called .htaccess. Ok that sounds little, let me tell you very simply. This is a very simple file that lets you do simple things like redirect from one URL to an other URL the thing I am specifically talking about is can password protect a sub-directory or even you can protect your entire site now I don't think we provide a .htaccess tool in the webmaster tools but that's ok there are a lot of them out on the web and if you do a simple search like .htaccess tool or wizard something like that you will find one that will say like a password protective directory and it can even tell a directory and generate one for you and you can just copy and paste that onto your website.
So this is very good why is this strong why am I going to draw a Happy face here. Well you got a password on that directory Googlebot is not gonna guess that password you know we are not going to crawl that directory at all and we if we cant get to it . It will never show up in our index. This is very strong, very robust and efficient for the search engine because someone has to know the password to get into that directory. So this is one of the two ways I really really recommend this is a preventive measure so if already got chance to get into it you already had it vulnerable on your site so if you plan in advance and you know what the sensitive areas are going to be just put a password on there and it will work really well.
Ok here is an other way one that a lot of people know about called Robots.txt. This one has been here for over a decade atleast 1996 and essentially its like a electronic no trespassing sign it says here are areas of your site that Google or other search engine are not allowed to crawl, we do provide robots.txt tool in the webmaster console so you can create one and test out URLs and see if Googlebot is allowed to get to them , you can test out like the different variants of Googlebot like the Image-Googlebot is allowed to get to it and you can take new robots.txt files for test drive so you can say how about I try this for my robots.txt could you crawl this URL, or could you crawl this URL and you can just try it out and make sure it works ok. That's nice because other wise you are going to shoot yourself on your foot say you make a robots.txt and make it like and it has a syntax error and say it keeps everybody in or keeps everybody out that's going to cause a problem. So I recommend you take that tool for a test drive and see what you like and then you can put it live.
Then ok robots.txt is kind of interesting, different search engines have different polices of uncrawled URLs , ill give you a very simple example way, way, way back in days sites like Ebay.com , Nytimes.com don't want anyone to crawl their site so they had a robots.txt file that said
Useragent: *
Disallow: / ( everybody )
So this will not allow any search engines to crawl even if you are a well behaved search engine. So that is kind of problematic so you are a search engines and somebody typed in Ebay and you cannot return Ebay.com it looks kind of dumb and its like what we decided or what we our policy still is we will crawl this page but we will not show a uncrawled reference sometimes we can make it look pretty good about it. Sometimes if there is a entry to nytimes.com in the Open Directory project ( ODP ) we can show that snippet from the ODP and show it for nytimes.com as a uncrawled reference and for users its good even though we are not allowed to crawl and we infact did not crawl it. So robots.txt is to prevent crawling but it wont completely prevent that URL from completely showing up in Google so there are otherways to do it. Lets move on to NOINDEX meta tag. What that simple says for Google atleast is don't show that page at all in search engines so if we find Noindex we will completely drop it from Google search results we will still crawl it but we wont actually show it if somebody does a search in search result query for that page. So its pretty powerful works very well and very simple to understand there are couple complicating factors, yahoo and Microsoft even if you use the noindex meta tag can show a reference to that page, they wont return the complete the full snippet and stuff like that but you might see the link to that. We do see some people having problem with that for example you are a webmaster and you put up a noindex meta tag and put it up on your site been shifting around in developing your site you might forget and might not take that noindex meta tag down so very simple example. The Hungarian version of BMW I think has done this, there is a musician ( harper ) you probably heard about is pretty popular has a noindex metatag its still there and if you are the webmaster of that site we love you to take that down. So there are various people in google would have said may be we should not show the snippet of the url but show a reference to that URL. There is one other corner case on this noindex which is we can only abide by that meta tag only if we had crawled that page of we haven't crawled that page we haven't seen that meta tag and we haven't know its there. So in theory its possible if you link to that page and we don't get a chance to crawl that page we don't see a noindex and we don't drop It out completely. So there are couple of cases where you have atleast the reference which will show up in google and pretty much yahoo and Microsoft will always have a reference to that page if you use the noindex metatag.
So here is another approach you can use that is the Nofollow tag that can be added on individual links. This is an other type weak approve since inevitably say there are 20 links to that page may be I am going to put a Nofollow on all of them may be it's a sign in page may be if you are a expedia.com and you want to add a Nofollow on my itineraries it makes perfect sense right. Why would you want Googlebot to crawl into your itineraries because that's a personalized thing. But inevitably somebody links to that page or you forget to have a page which not every single link with a Nofollow so its very common that, ill just draw a very simple example suppose we have a page A and we have a Nofollow link to page B,
We will follow that link we will drop it out of our link graph we will drop it off completely so we wont discover page B because of this link but now like say there is an other guy on page C that wants to link to page B we might actually follow that link and will eventually end up indexing page B so you can try to make sure every link to a page is no-followed but sometimes its hard to follow that every single link is no-followed correctly so this like the NOINDEX does have some weird corner cases where you can very easily a page gets crawled since not every link has the Nofollow-ed or in the noindex case we can actually get to the page and end up crawling the page and end up later seeing the noindex tag. So lets move on to an other powerful way I tend to use this whenever a forum gets by porn spammer recently. And that's the URL removal tool. So .htaccess is great as a preventive measure you put a password on it no-one can guess what it is, no search engine's are going to get in there, it wont get indexed. The other thing you can do is if you do let the search engines in before and you want to take it down later you got the URL removal tool. We have offered the URL removal tool for atleast 5 years probably more for long time it sat on pages like services.google.com and it's a completely self service that runs 24/7 but just recently the webmaster console team has integrated the URL removal tool into the webmaster console. Much much simpler to use the UI is much better what it helps is it will remove the URL for 6 months and if that was a mistake and if you removed your entire domain which you don't need to then you need to email Google's user support telling them hey I didn't mean to remove my entire site can you revoke that and someone in google have to do that. Now you can do it yourself also its powerful and well accessible in webmaster console. Anytime you can go in to webmaster console and say hey I didn't mean to remove my entire site and remove that request and that request gets revoked very quickly. So to use the webmaster console its not that hard to prove that you are the owner of the site, you just need to make a page on the root of the website , root of the directory or root of the site to say yep here is a little signature in the text file to say that this is my site. Once you prove that this is my domain then you get a lot more stats and this wonderful little URL remove tool. And it can remove a very nice level of speed in there you can remove a whole domain, you can remove a sub-directory thing you can even remove individual URLs and you can see actually the status of all the URLs you have put a request to be removed, initially it will show a status that the request is pending and later it will show that the URL removal has been processed/ removed. This will change the status to revoke. You can give a reason what ever you have like hey I got the credit card numbers, Social security numbers of what ever sensitive you had there removed and you want to revoke the URL removal from Google's index. In other words its save to crawl and index again. So all the ways to remove the URLs or churning up URLs from showing up in Google there are a lot of different options some of them are very strong like the robots.txt, noindex but they do have these very weird corner cases like we might show the reference to the URL in varied situations so that ones that I definitely recommend is the .htaccess that will prevent the search engines and people from getting into the first place and for Google we have the URL removal tool so if you got URLs crawled that you don't want to show up in Google's index you can still get them out and get them out relatively quickly.
Thanks Very much Hope that was very helpful.
Matt Cutts.
Why we prepared this Video transcript?
We know this video is more than a year old but still there are people who have questions about their site and want to listen from a Search Engine Expert. Also there are millions of Non-English people who want to know what's there in this video so a transcript is something that can be easily translated to be read in other languages. We know there are people with hearing disability who browse our site this is a friendly version for them where they can read and understand what's there in this video.
This transcript is copyright - Search Engine Genie.
Feel free to translate them but make sure proper credit is given to Search Engine Genie
Labels: Mattcutts, Mattcutts Video Transcript, Video Transcripts, Videos
0 Comments:
Post a Comment
Links to this post:
Create a Link
<< SEO Blog Home