How to Use HTtrack to Copy a Website to Your Hard Disk

We were talking about transferring a website to another host and I promised I will explain every step outlined in that article in detail. Now if you are not transferring from SBI! you probably won’t have to do this yourself. Most hosts will help you out and transfer your files for you for free. However, you might want to copy your website for some other reason, such as to view it offline or just save a copy for a backup.

We will be using HTtrack Website Copier which is a free tool. You can download it here. I made a video tutorial and written tutorial. Some people find HTTrack settings a little confusing. If you are one of them you can use both. In addition some of HTTrack questions, such as errors, were covered in my forum. If you have any difficulties feel free to post your question there.

NOTE: If you blocking access to some parts of your site (like .pdf files for ebooks or reports for example) via robots.txt Httrack will not copy it. You will need to get them manually.

Here is the video:

Here is what you’ll have to do:

– Decide where you want to save your website. I suggest that you create a special folder for this. I created a folder called “MyWebsite” for this purpose.

– Open HTtrack and set the following settings:

httrack tutorial

Project name and category can be anything you want. Base Path should be the folder where you want to save your website’s copy.

– Click “Next”.

– Action should be set to “Download Website”.

– Enter your website’s URL in the URL box.

– If your goal is only to view your website offline you can go ahead and click “NEXT” without modifying preferences.

However if your goal is to upload this website to server AND your website has outgoing links, such as affiliate links and links to other websites it is better to do some modifications.  Here is how:

– Click on “Set Options”, then click on “Experts Only”

– Choose “Absolute URL” from dropdown menu, like this:

httrak url settings

Now Click “OK” and “Next” then “Finish”. (If you don’t do this all your outgoing links will be broken.)

Your HTtrack will take care of itself now and you will see something like this:

httrack work in progress

Wait until it finishes. HTtrack is pretty fast. At the end you will see this on your screen:

httrack success

Click on “View log file” to check errors. If you you find any errors here take a note and find the page which shows error and fix it manually. You never really get more than a few errors. If they are too many it could be due to some hiccup in connection. In this case it is easier to run HTTrack another time and get another copy which should be better.

You can browse your mirrored website right from here or from the folder on your computer where you saved it. Best of all HTtrack preserves the file structure of your website.

If you are transferring  your files to another host double check all files and images. Something might be duplicated and a file or two might be missing. Fix anything that has to be fixed manually. You are now ready to upload this website to your new host. I wrote instructions on how to use Filezilla for uploading your website here. If you have any questions you are welcome to open a thread in my forum here and we will try our best to help you out.

Good luck with HTtrack!

39 Replies to “How to Use HTtrack to Copy a Website to Your Hard Disk”

  1. Hi,

    Thanks for the great explenation! (Sorry for my bad english)
    Now I have a question, what if you have some errors. How can u fix them?
    I have some images that are downloaded 2 or 3 times. Wich one of them should I delete?

    I hope youll anser my question 🙂

    Thanks!

    1. Hi Jonathan,

      no unfortunately you can’t. Httrack simply copies pure HTML of your site. You can copy your SBI site and upload it to new host and leave it in pure HTML – that would be ok too, especially if you don’t work on it anymore. However if you want to move to WordPress you will still have to rework manually each page as described in my static to WordPress tutorial here http://webmasterdiary.org/wordpress/transfer-static-website-to-wordpress/

  2. Sometimes I get the 403 error
    “HTTrack3.47-19+htsswf+htsjava launched on Thu, 20 Jun 2013 22:56:45 at http://redefinegaming.net/ […]
    Information, Warnings and Errors reported for this mirror:
    note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
    such as username/password authentication for websites mirrored in this project
    do not share these files/folders if you want these information to remain private
    22:56:46 Error: “Forbidden” (403) at link redefinegaming.net/ (from primary/primary)”

    What have I to do to solve this problem?

  3. Thanks so much for all your generous help, Elena!

    I downloaded my site following the instructions, but I cannot browse the downloaded (mirrored) copy. (This is not something I need to be able to do, but you said to do it as a check.) The only way I can browse the downloaded copy is if I download it using “Relative URI” instead of “Absolute URI”.

    So, should I download it using Relative URI? Or is it OK if my downloaded copy is not browsable? Or am I doing something completely wrong? Thanks!

    1. Hello Ann, what do you mean you can’t browse it? What happens when you attempt to browse your copy?

      The reason why I don’t suggest using relative URLs is because they break Javascript-based ads, Facebook like boxes etc. Most people have something like this on their sites so I find it easier to set to absolute URLs right from the beginning.

  4. The mirrored copy of my home page opens in my browser, but all the css formatting is gone and all the images are missing (even though I can see that the image files did get downloaded). Then if I click on one of the links, it tries to connect to the internet to get the live version (because they are the absolute links, I guess) . The relative-links mirror doesn’t have any of these problems. Both mirrors downloaded almost exactly the same number of pages, so it doesn’t seem to be a matter of missing files.

    I see a couple of threads on the HTTrack forum where others have had the same confusion. I’ll keep digging there.

    One more question:
    Is it normal to get a whole bunch of other files in my download, from sites other than my own? (apis.google.com, connect.facebook.net, graphics.sitesell.com, http://www.statcounter.com, etc.) I’m wondering if this is OK or if I should check the “No external pages” box to avoid downloading all these. (I won’t be uploading them to my new host, right?)

    Thanks again!

    1. I replied you via email but apparently something didn’t work. Styling is definitely a matter of CSS so either CSS file is missing or the reference to CSS file is messed up. It happened to me couple of times, sometimes HTTRACK just hiccups and you simply need to run it again. I would run it again using Absolute URLs and everything should be fine. Going to live URLs of your site when clicking on links is normal. The reason why I suggest using absolute because relative URLs will typically mess up your Javascript-based ads if you have them (most people do).

      Yes it’s normal to get facebook and sitesell graphics, ignore these files – just delete them.

  5. OK! I didn’t realize that going to live URLs when clicking on links is normal.

    I guess what makes it confusing is that HTTrack was written to create a local mirror and does not seem to be used for moving a website very much.

    Many thanks for your help!

  6. Thank you so much for the tutorial, Elena.

    Unfortunately, when I try to download the mobile version of a website I use a lot, Httrack does not manage to download anything. Could it be that the website server is identifying me as a PC and not a mobile phone, and as such blocks the whole process ? Or am I missing something in the configuration ?

    What’s funny is I can browse the web-based mobile site from my PC, but there is no way I can download it with Httrack… Could you help me with this, please ?

    Thanks.

  7. Hi Nico,

    unfortunately can’t comment on that. I never tried to download mobile version of site but I would expect that Httrack will download web based version with all files, including mobile files attached.

  8. Small question,
    I try to have a txt file to download only specific file.
    But I must done something wrong, it’s doesn’t work.
    Do you have any idea what should I have in my txt file? demarc? etc….
    Thanks.

    1. Hi Francois,

      I am not sure I understood your question. If you need only specific file try to enter full path to that file instead of main site’s URL. To be honest I didn’t try it but I think this would work.

      1. that’s my issue.
        I put all my url in a txt file, but it’s doesn’t work.
        I don’t know if I have to put special demarc like ; or , between each line.
        Thanks for your help.
        Best regards,

        1. I am not sure Francois, it’s not like I am Httrack pro. This tutorial was written with particular group of people in mind and I learned only what they need to know. This tutorial is definitely not sufficient for all kinds of purposes but it works for simply copying a static website. I don’t know everything there is to Httrack sorry 🙂

        2. Francois,
          Instead of putting all your URLs in a txt file, you want to enter them in the “Web Addresses” box during the HTTrack process. (But this should only be necessary for files that cannot be reached via links from your home page, assuming you also have the home page URL in the Web Addresses box.)

    2. Francois,
      Elena is right. In the “Web Addresses” box, where you normally put the site URL, just put the URL of your txt file. (I put a list of files here in addition to my site URL, to pick up all the extra little non-html files that are on my site and are not spidered.)

  9. hi, my problem is: I tried to copy only specific files from a site (*.djvu) Manually it is possible: go to the page, right-click on a link, save.

    But HHtrack refuses to do it;

    in settings I wrote:
    -*
    +www.neededsite.com/*.djvu

    — did not work (the reason is probably that the description of the file with the link is on 1 directory ( eg neededsite.com/descr/finename.html

    but the link for the file is smwhere else (e g neededsite.com/upload/filename.djvu

    there might be robots.txt, but I tried -s0

    still nothing in the end; log file shows nothing

    Any idea or suggestions?

    (the site in ? is http://www.runivers.ru)

    thanks in advance

  10. I am trying to make a backup copy of my website and I am following the steps above and I keep getting the message “ensure the website still exits (which it does) or check your proxy settings.”

    then when I click on the log I get the hts-log.txt file and hts-cache folder, may contain sensitive information such as username/password authentication for websites mirrored in this project. do not share these files/folders if you want this information to remain private.
    Warning: due to http://www.XXXXXXXXX.com remote robots.txt rules links beginning with these paths will be forbidden/,/checkout (see in the options to disable this)
    Warning: moved permanently for http://www.XXXXXXX.com
    Warning: File has moved from http://www.xxxxxx.com/ to index.html
    Warning: No data seems to have been transferred during this session. restoring previous one.

    O.K. What do I do to fix this and get a copy of my Website? any help would be appreciated, thanks

    1. Httrack will not copy files blocked by robots.txt. Regarding other issues I don’t have an answer, must be something about your permission settings.

  11. can anyone know the meaning of this
    HTTrack has detected that the current mirror is empty. if it was an update, the previous mirror has been restored. reason : the first page(s) either could not be found, or a connection probelm occured
    i get this error when i try to download a particular site. it works well with some websites. what could be the probelem?

  12. Please Sir I want to duplicate an eCommerce website and I don’t know if this WinHttrack tool can do the work and how to go about it as I do not own or have the login details of the source website. It’s an eCommerce website and I want to duplicate and make custom changes so that it looks completely different. Please I would be waiting for any ones suggestions and help. Thanks

    1. Hi Kelvin, Httrack can only copy HTML, CSS and images. It cannot copy anything that is dynamic and has no access to database. You will be able to copy the design but it won’t be usable unless you are willing to handcode or know how to connect that to database and make dynamic.

  13. hi ELENA
    my probleme is that i have included video and audio files ( in option scan rules ) but HTTRACK dont download theme or it download theme but with small volume. thank you

    1. Hi Ziad,

      where do you host your videos? Is it YouTube or Vimeo? If so you shouldn’t have any problem. You don’t need to modify anything in scan rules.

      As for audios, I am not sure but if it’s hosted externally, similarly to YouTube for audios it should be same. If it’s something you are hosting on your SiteSell hosting it’s hard to tell for me without seeing or trying myself. In any case, you can always upload your audio files manually.

  14. Hi Elena

    Thank you for what you do. But, I have run into a problem. I have my new account with A Small Orange. I have copied my website with HTTrack. I uploaded it with WinSCP. I made sure I only uploaded the files I needed (though that could be where I screwed up). From the directory I copied to with HTTrack, I went into the project folder, and then into the folder with my domain.

    When I go to check my work, I get a strange page with “Index of locally available projects:

    No categories”

    And then a link to my domain without the “www”

    This is all in a box with “HTTrack Website Copier – Open Source offline browser”

    When I click on the link for my domain, I get “404 Not Found

    The server can not find the requested page:

    sterling.asoshared.com/~roadtri2/roadtripwise.com/index.html (port 80)

    Please forward this error screen to sterling.asoshared.com’s WebMaster. ”

    I hope all this info is clear enough for you to help me out.

    Thank you
    Andrew

    1. Hi Andrew,

      yes it seems you uploaded them in a wrong place. Have you used my “Under 1 hour” tutorial? With ASO account you have free access to that tutorial. It has better screenshots and should help you fix the problem. Once again, here is what to look for:

      • 1. you should upload only files that belong to your site (without folders created by Httrack),
      • 2. these files should go to public_html of your cPanel’s File Manager.

      It is hard to tell from the error I see what exactly is your problem, but most likely they are uploaded in a wrong directory. You can log in to http://sterling.asoshared.com/cpanel, then go to File Manager and see what’s going on first hand. If you can’t find them immediately, you can use search function (top right). Just enter any page from your site like “types-of-roads.html” and enter it in search field. It should show you where it is.

      For example, if it shows that this page is in public_html/somefile/types-of-roads.html, it means you need to navigate to “somefile” and get this page and all others files and pages and move them all to public_html. The proper path should be public_html/types-of-roads.html

      When you find all your files, you can choose them all, then right-click, then choose “MOVE”, then type in the field you will see /public_html

      This will move all files at once. After that you should be able to preview your site without any problem.

      1. I did have all my image files, style sheet files (I think that’s what they were…*.css), and some others in subfolders. I moved them all into the public_html folder, and now I can check my homepage. But as soon as I click on a NAV button to see another page, the address bar shows that I’m looking at the actual webpage…I believe the one at SBI. Should I be able to see all the pages with “http://sterling.asoshared.com/~roadtri2/” in the address bar?

        1. I looked at your site and although it looks fine now, there is a problem:

          1. For images and stylesheets — they SHOULD be is image-files (or images) and support-files folders, but THESE folders have to be in your public_html. As for regular pages, they should be in public_html without any additional folder.

          The goal is to keep exactly same structure, this is why you need to preserve support-files and image-files (or images) subfolders, because this is how they were stored in SBI!. When you open your File Manager in cPanel and go to public_html you should see structure similar to this:

          The only difference is that pages will show .html extension after it. But the structure should be same.

          As you see, there is image-files, images and support-files subfolders.
          From what you told me and also from what I can see from your site, it seems you got all images and support files and put them at the same level as pages.

          For example this photo http://sterling.asoshared.com/~roadtri2/road-trip-with-kids.jpg shouldn’t be there. It should have temporary URL with “images” in it because it should be in folder called “images” and this folder should be located in public_html. So the complete URL would look: http://sterling.asoshared.com/~roadtri2/images/road-trip-with-kids.jpg

          It is partially my fault too because I assumed it was clear that I don’t mean folders that belong to support-files and images or image-files because of the context of the tutorial. But don’t worry, you can fix this easily.

          The site displays correctly because it loads images and style sheets from your live SBI! site, but if you switch name servers, the images and style will break. You have two ways to fix it:

          1. manually create images and suppport-files folders in your public_html and sort everything as it should be (not recommended)
          2. go to public_html and delete all files and folders you have there now and reupload site again. This time you should make sure that you upload only what you need and to public_html. The structure should look like in the image above.

          Hope this make sense.

          2. Clicking on menu and going to live site is normal. That’s the URL you have hard-coded in your HTML so it is not going to change. The way to preview pages is add each page’s file name to the end of your temporary URL.

          Once again my under 1 hour tutorial is free and open for you. It should help you do this without mistakes. Let me know if you need more help 🙂

          1. Wow! I am learning so much here. Thank you. I hope I am not asking too many questions, I appreciate the time your willing to spend helping us along.

            I re-uploaded the website without the unnecessary files, and I preserved the folder structure. Now I can check my work successfully. By the way, I think the reason I couldn’t before is because I typed in the address bar: “sterling.asoshared.com/~roadtri2” (without “/” at end)
            After I added a “/” on the end it worked fine.

            I still have two questions:

            1st question – HTTrack gave me these errors at the end of copying…
            16:04:09 Error: “Not Found” (404) at link http://www.roadtripwise.com/road-trip-to-uncle-roberts-2011-page-1.html (from http://www.roadtripwise.com/utah-road-trip-2011-page3.html)
            16:04:11 Error: “Not Found” (404) at link http://www.roadtripwise.com/www.amazon.com (from wms.assoc-amazon.com/20070822/US/js/link-enhancer-common.js?tag=rotrwi-20)
            16:04:11 Error: “Not Found” (404) at link http://www.roadtripwise.com/www.assoc-amazon.com (from wms.assoc-amazon.com/20070822/US/js/link-enhancer-common.js?tag=rotrwi-20)
            16:04:12 Error: “Not Found” (404) at link http://www.roadtripwise.com/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ (from connect.facebook.net/en_US/sdk.js)
            16:04:12 Error: “Not Found” (404) at link http://www.roadtripwise.com/restserver.php (from connect.facebook.net/en_US/sdk.js)
            16:04:12 Error: “Not Found” (404) at link http://www.roadtripwise.com/dialog/ (from connect.facebook.net/en_US/sdk.js)
            16:04:12 Error: “Bad Request” (400) at link http://www.roadtripwise.com/%s/profile.php?id=%s (from connect.facebook.net/en_US/sdk.js)

            The first one doesn’t seem to be a problem, the link actually works fine for me. The rest are Amazon or Facebook related. Do I need to worry about those?

            2nd question – There are a couple folders in my Public_html folder I might not need. Can I delete these:

            /%s – contains “profile4d6d.html”

            /ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+ – contains “index.html” which is weird because the Public_html folder already contains “index.html” and index-2.html”

            /dialog – which contains another “index.html”

            There is also /sd folder which has two subfolders /image-files and /support-files, but they seem relevant with contents like style.css and nav-image.gif

            There is also another /support-files in the Public_html folder, but that also seems important.

            I hope my lack of savvy is not too much of a burden. I thank you for whatever support you can give me in this intense transition.

            Thank you
            Andrew

          2. Hi Andrew,

            the site looks fine to me now. I checked only two pages but the pages I saw looked correct. You can check any page by adding its filename to the end of temporary URL like this example: http://sterling.asoshared.com/~roadtri2/road-trip-cars.html

            As for “/”, I didn’t pay attention but I don’t think it ever caused me a problem the way you describe. I clicked through the link you sent and it was 404 error. Never mind, stuff happens. The most important thing that you sorted it out.

            For errors, page “uncle Roberts trip” page is apparently missing on your SBI site but is linked from another page. This is the reason why HTTrack warns you about this error. It also tells you that this page has link on your Utah page, so when you have time, it’s a good idea to go and remove the dead link.

            I wouldn’t worry about other errors like Amazon and Facebook. I have no idea what these letters are (didn’t see it before) but it looks like something related to Facebook.

            As for “sd” folder, KEEP IT. I never saw this before, but apparently SBI! was doing changes since I left. Your SBI site has sd folder now which contains support files. So you should keep it. I think it has something to do with their BB2. Just keep it.

            I don’t know where dialog/index.html came from. I think it must be safe to remove it. Basically it would have a url like this (after the transfer): yoursite.com/dialog/index.html Did you have something like this? I don’t think so. If in doubt, leave it. You can sort it later.

            About Adsense, you are right. I said that it won’t show in the preview on IP address. But I think it shows because you chose to use name server URL (sterling.asmallorange.com) instead of IP address (which would be something like 123.45.67). It’s not wrong to preview the way you do, but it’s something I didn’t teach in my tutorial. Server URL is just like any other URL so Adsense can’t tell the difference and that’s why it displays. Nothing to worry about.

            Don’t worry about all these mistakes that happened. Although you may have spent a little more time messing with your files in your cPanel, you now have an advantage of understanding your site’s autonomy much better than someone who was able to follow the tutorial without any problems from the first time. You understand how your URLs are built, folder structure and you even know how to move your files within File Manager, which is great! 😀

            This is how I learned — by making mistakes, breaking things and fixing them afterwards. I still do this whenever I am curious about how something works. Breaking and then trying to collect things back or change them slightly is a great way to learn and the knowledge you get is much deeper than if you just read about it.

            Hope I clarified everything and good luck with your transfer 🙂

  15. Thanks for the tutorial!
    I have a big problema, I wonder if you can help…
    I created a website in localhost with xampp and wordpress, and now I’m trying to mirror it with httrack to save a static version that I can finally move to the server.
    The problem is that it simply doesn’t work…
    The error is that “the mirror is empty” and it suggests to check if the url is correct or the proxy settings… I tried to change the proxy but nothing happened.
    anyway the software works well with other live websites, but I suppose there’s a compatibility issue with xampp.
    Do you have any idea?

    1. Hi Giorgiaboi,

      I never tried to use it that way. It very well could be it’s not suitable for this. Why don’t you just upload your WordPress database and wp-content to live server at your host?

Comments are closed.