The Wayback Machine by Archive.org is not necessary a bad thing. Just in case you don’t know what I am talking about, it’s a tool located at archive.org. It crawls the web and archives web pages.
This means that you or anyone else can go there, enter your URL and see how your website looked years ago. It can be very useful when you accidentally lose a page. It also means that anyone can browse the old version of your website, read content (even if it was deleted), and do pretty much anything they might want to do! Whenever I am curious about a website and have some time to kill I go there and see what they did in the past. Sometimes it can be very revealing!
Now what if you don’t want to be in their archive? What if you don’t want anyone to see how you started, your embarrassing newbie mistakes and ugly design? What if you wrote something embarrassing and then deleted it? Definitely you don’t want anyone to find it years later!
Fortunately there is a fix for this. Go to your robots.txt and add this
This will prevent The Wayback Machine from archiving your pages in the future and delete all history (at least from public view, I don’t know if they keep it for themselves). I did this with one of my websites less than 24 hours ago and my history for that website is already gone.
The only thing is… is there anything similar to archive.org? I have no idea. And you?
UPDATE: I have experimented a little more with this. Here is what happens: once you add the code above to your robots file your archive history becomes unavailable after around 24 hours. However, if you remove the code again, all your history becomes available. This was very handy when I realized I lost one of my old pages. I removed disallow code, waited a little, got my page then put the code back. The reason I could do this was probably because Archieve.org had the copy of my page in the past. I don’t think this would be possible for new sites who disallow Archieve.org right from the beginning.
Yesterday I had to deal with a very frustrating issue. I was forced to use my old laptop that I personally didn’t use for a while and whenever I tried to type in a URL of my own website or any other website in browser I would be redirected to Ask.com search results (or ad results). This happened although the URLs I was typing were correct, so it definitely felt like my Chrome browser was hijacked.
The fix was quite simple (although I still have hard time to understand why I would be redirected when my URLs were correct). Here is what to do:
- Click that strange button to the right of your browser as shown on the image below. When you hover over it it says: “Customize and control Google Chrome”. Then choose Settings from drop-down menu.
Yeah, I always have that many tabs open 🙂
- A page will open. Scroll down to where you see:
Set which search engine is used when searching from the omnibox. Sure enough, you have Ask.com chosen. Change it to Google or another search engine you can live with. Voilà!
Most of us have passed through frustrating experience of discovering that our content has been scraped by another website. It is even more frustrating when you keep emailing the scrapers begging to take your content down and receive no answer. When your website is still young this type of things can really hurt you and you might consider confronting the copycats using more serious weapons than simply begging them to remove your stuff from their sites.
WARNING: this is a radical method. It absolutely works, but it is better to give them a chance to correct their mistake by asking first. If you were ignored (which is usually the case) and if they are using Adsense on your content you can report them to Adsense for policy violation. You can do so by simply clicking Ad choices, scroll down until you see: “Report a policy violation regarding the site or ads you just saw”, choose “website”, then choose ” The site violates Adsense program policies in other ways”, explain what happened in a few words in details area, leave your email and you are done.
Now how does this get your content removed? It doesn’t. But in my experience it won’t take more than 2-3 days for their Adsense ads disappear. Once the ads are gone these sites typically lose the reason to continue. In one case the content stayed for a while and then the entire site disappeared as domain was due to renewal. In another case all pages were deleted and replaced with another content which they apparently hoped to monetize in some other way.
After a week or so, when Adsense account of scraper site will be banned you will receive an automated email from Google that suggests that you file to DMCA as well, but usually you won’t have to bother, because as I said the scrapers typically lose the reason for continuing their websites and disappear from the face of web entirely.