Wednesday, 6 May 2015

Download recursively all files from a certain directory listing using wget

This is going to be a quick blog post about wget which I believe it is very interesting for you to know how to do this. From your Linux box you can use wget to recursively download all the files listed in a directory listing. 

If you have seen something similar to Figure 1, then this is what directory listing looks like. If someone wants you to have access to their files on the web server through HTTP then it is a quick and easy way of doing it, but most of the time is a misconfiguration allowing the hosted files to be publicly available to unauthorised users. 

Figure 1 - Directory Listing

Lets assume that you want to download all the files from:

There might be other parent directories there along with admin, like admin2, user, old, etc. but you only want to download all the files and sub-directories inside the backup directory, recursively. 

This can be done with the following command: 

wget -r -np -nH --cut-dirs=1 -R index.html

More specifically, the above command will download all files and sub-directories within backup/

due to the different switches provided:
              -r -> recursively 
            -np -> ignore directories before the one specified (backup/)
            -nH -> not saving files to hostname folder
--cut-dirs=1 -> omitting the first 1 folder(s) (/admin/)
              -R -> exclude index.html files

Note: If the download fails for any reason mid way, you can use -nc -c to skip the already downloaded files, resume any unfinished files and continue downloading the what is left:
wget -r -np -nH --cut-dirs=1 -nc -c -R index.html

Now, just in case you want to download everything without any restrictions, you can give the command:

wget -m

In this case it doesn't matter that you pointed the URL to the /admin/backup/. Using the switch -m [URL] wget will start downloading everything from

No comments:

Post a Comment