Prevent The Wayback Machine from Archiving Your Pages (and Delete All History!)

The Wayback Machine by Archive.org is not necessary a bad thing. Just in case you don’t know what I am talking about, it’s a tool located at archive.org. It crawls the web and archives web pages.
stop the wayback machineThis means that you or anyone else can go there, enter your URL and see how your website looked years ago. It can be very useful when you accidentally lose a page. It also means that anyone can browse the old version of your website, read content (even if it was deleted), and do pretty much anything they might want to do! Whenever I am curious about a website and have some time to kill I go there and see what they did in the past. Sometimes it can be very revealing!

Now what if you don’t want to be in their archive? What if you don’t want anyone to see how you started, your embarrassing newbie mistakes and ugly design? What if you wrote something embarrassing and then deleted it? Definitely you don’t want anyone to find it years later!

Fortunately there is a fix for this. Go to your robots.txt and add this

User-agent: ia_archiver
Disallow: /

This will prevent The Wayback Machine from archiving your pages in the future and delete all history (at least from public view, I don’t know if they keep it for themselves). I did this with one of my websites less than 24 hours ago and my history for that website is already gone.

The only thing is… is there anything similar to archive.org? I have no idea. And you?

UPDATE: I have experimented a little more with this. Here is what happens: once you add the code above to your robots file your archive history becomes unavailable after around 24 hours. However, if you remove the code again, all your history becomes available. This was very handy when I realized I lost one of my old pages. I removed disallow code, waited a little, got my page then put the code back. The reason I could do this was probably because Archieve.org had the copy of my page in the past. I don’t think this would be possible for new sites who disallow Archieve.org right from the beginning.

4 Replies to “Prevent The Wayback Machine from Archiving Your Pages (and Delete All History!)”

  1. I think you should let your information be given to Archive.org because If you post something on the internet you did not want there, Why did you put it on the web in the first place? And nobody cares about the “Ugly Design” on your website. It was a long time ago and everybody knows it looks like that just because its old. I think you could keep it on there so people can discover stuff like what it looked like in the old days. lets just say I put a picture in the internet of me that I did not want anyone else to see. That means don’t put it on the internet
    this is just my opinion. don’t take it seriously.

    1. You are definitely right, however sometimes there might be situations when you put something online and then regret it. Or when you start small, grow big and have new affiliations or partnerships. You might not want them to know detailed history of your site. If you ever deleted a page from your site, you know it. If you delete a page it’s because you don’t want anyone to see it, but archive.org allows anyone to see your deleted pages and this is not something I appreciate as a webmaster.

  2. If someone doesn’t want old versions of their website being saved and shown to the world, then that should be their prerogative. It’s nobody else’s business, period.

Comments are closed.