Your own website against you - nice writeup
"ITFALLS OF SAVING YOUR SITE FOR POSTERITY
Search engines automatically cache your pages and something called the Internet Archive, or Wayback Machine, also comes along and makes a permanent copy of your site for "posterity". The problem starts when you realize you may have content on your web site that could result in legal issues. You may act quickly to resolve those issues yet the problems still remain without your knowledge because you didn't act as quickly as all the robots crawling your site.
Unfortunately, legal beagles love that your site was saved for "posterity" when gearing up to file a lawsuit so although you've already done the right thing by cleaning potentially harmful things off your site, the tireless automatons crawling the internet have made sure there's plenty of evidence and the next thing you know, you're about to get hung out to dry.
If you think the lawyers aren't technically savvy, think again:
|
http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1202422968612
Not only can they find your content, they do it under cloak without your knowing about it!
|
You can forget your rights, just throw them out the window, because the history of your website is already busy squealing on you without your knowledge or permission.
HOW DO YOU PROTECT YOUR SITE FROM HISTORICAL SNOOPING?
Obviously the simplest way is to keep your nose clean so nobody has a reason to be snooping in the first place.
However, this is the internet and you have to OPT-OUT of things to protect your rights.
Here's a few preventative ways to stop your website from being archived and being used as a snitch:
USE NOARCHIVE
Make sure you include the NOARCHIVE meta tag in each web page so that there is no cache in any of the major search engines.
USE ROBOTS.TXT
Block all of the archive site spiders, such as used by the Internet Archive, in your site's robots.txt file with an entry as follows:
|
The [url=http://crawler.archive.org/]Heritrix software[/url] used by the Internet Archive is Open Source which means there are more archives out there and possibly using deviations of Heritrix that ignore robots.txt and cloak their access to your site.
HELP FOR HOSTED BLOGGER ISSUES
If you're running a blog hosted on a 3rd party service like Blogger or WordPress, your options may be limited to just embedding NOARCHIVE which the Internet Archive ignores, meaning anyone running stock Heritrix code would also ignore by default.
The only way you can exclude your site, [url=http://www.archive.org/about/exclude.php]according to their site[/url], is to contact them directly. Obviously an insufficient amount of businesses and sites in general are aware of the perils posed by the Internet Archive or they would honor the NOARCHIVE tag for those sites with limited access and no robots.txt just to avoid a flood of emails.
OTHER POTENTIAL RISKS
Snap.com has taken screen shots of every web page, then Ask started taking limited screenshots as well as a some new completely graphical search engines like SearchMe. Some screen shots have minimal resolution too tiny to read but others, like Snap and SearchMe, are big enough you can read, and these too are called evidence in a lawsuit. Even the tiniest thumbnail can still show a licensed trademark being used without permission.
Some of the social bookmarking sites that allow large chunks of content to be copied such as Kaboodle, Jeteye, Eurekster, some using tools like Heritrix (see above), to make small archive copies of specific content.
SUMMARY
Obviously there's no way you can completely stop anyone from making copies of your site but it may pay by being diligent in keeping many of these technologies off your site that provide any form of archives.
This is just another form of insurance that could, in the end, save your business, your house, your car, your family... "
Labels: Search Engine Marketing
0 Comments:
Post a Comment
Links to this post:
Create a Link
<< SEO Blog Home