The Wayback Machine by Archive.org is not necessary a bad thing. Just in case you don’t know what I am talking about, it’s a tool located at archive.org. It crawls the web and archives web pages.
This means that you or anyone else can go there, enter your URL and see how your website looked years ago. It can be very useful when you accidentally lose a page. It also means that anyone can browse the old version of your website, read content (even if it was deleted), and do pretty much anything they might want to do! Whenever I am curious about a website and have some time to kill I go there and see what they did in the past. Sometimes it can be very revealing!
Now what if you don’t want to be in their archive? What if you don’t want anyone to see how you started, your embarrassing newbie mistakes and ugly design? What if you wrote something embarrassing and then deleted it? Definitely you don’t want anyone to find it years later!
Fortunately there is a fix for this. Go to your robots.txt and add this
User-agent: ia_archiver Disallow: /
This will prevent The Wayback Machine from archiving your pages in the future and delete all history (at least from public view, I don’t know if they keep it for themselves). I did this with one of my websites less than 24 hours ago and my history for that website is already gone.
The only thing is… is there anything similar to archive.org? I have no idea. And you?
UPDATE: I have experimented a little more with this. Here is what happens: once you add the code above to your robots file your archive history becomes unavailable after around 24 hours. However, if you remove the code again, all your history becomes available. This was very handy when I realized I lost one of my old pages. I removed disallow code, waited a little, got my page then put the code back. The reason I could do this was probably because Archieve.org had the copy of my page in the past. I don’t think this would be possible for new sites who disallow Archieve.org right from the beginning.