wget download all files on page/directory automatically recursively

Have you ever found a website/page that has several or perhaps dozens, hundreds or thousands of files that you need downloaded but don't have the time to manually do it?

wget's recursive function called with -r does that, but also with some quirks to be warned about.
If you're doing it from a standard web based directory structure, you will notice there is still a link to .. and wget will follow that.

Eg. let's say you have files in http://serverip/documents/ and you call wget like this:
wget -r http://serverip/documents, it will get everything inside documents but also browse up to .. and basically download every traversable file that can be followed (obviously this is usually not your intention).

Another thing to watch out for is trying to use multiple sessions to traverse the same directory.
By default wget will overwrite all files in place that it finds are duplicates.  The -nc option stops it from doing it, but I prefer the -N option which compares the time and size of the local and remote files and resumes if necessary and ignores them if they are the same (it doesn't compare by checksum though).  I think -N is what most will find makes sense for them.

Avoid traversing outside of the intended path, by using -L for relative only.


Best Way To Use wget recursively

wget -nH -N -L -r http://serverip/path

-nH means no host directory, otherwise you'll get a structure downloaded that mirrors the remove path which can be annoying.

Eg. it would create serverip/path/file

-N tells us to resume files if they are incomplete but if the remote file is newer or bigger, then resume/overwrite.  Otherwise nothing is done, the file is skipped since there's no sense in downloading the same thing again and overwriting.

-L says stay in the relative path and is the behavior that you probably wanted and expected without using -L

-r is obvious, it means recursive and to download from all links in the specified path

But even the above still does some annoying things, it will traverse as many levels as it can find and see.

Latest Articles

  • failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
  • pcnet32: eth0: transmit timed out, status 97fb, resetting - NIC card problem solution
  • Linux Screen How To Scroll Up and Down
  • Directadmin Install Segfault Error
  • Could not display "trash:///". Error: DBus error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout
  • SSH error slow login debug1: An invalid name was supplied Cannot determine realm for numeric host address - Solution
  • How To Install CPanel
  • LOG: MAIN PANIC failed to expand condition "${if eq {$authenticated_id}{}{0}{${if eq {$sender_address}{$local_part@$domain}{0}{${if match{$received_protocol}{N^e?smtps?a$N}{${perl{checkbx_autowhitelist}{$authenticated_id}}}{${if eq{$received_prot
  • Firefox 11 closes/quits without saving Open Tabs Prompt Solution/Fix
  • Firefox 11 stop hiding http:// and https:// solution fix
  • The Importance of a High Quality Power Supply/Power Supplies To Prevent Overheating/System Crash/Hardware Damage
  • Asus VE247H 23.7" Inch LCD/LED Backlit Monitor Dead/Stuck Pixel Policy Complaint
  • Firefox Error ./firefox-bin: error while loading shared libraries: libxul.so: cannot open shared object file: No such file or directory
  • Linux Ubuntu Nvidia GT430 Lockups/Errors/Freezes NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
  • Xen how to mount disk images off-line and access data
  • Xen non-HVM container won't work/boot anymore
  • how to exit xen console session from xm
  • Skype Linux/Ubuntu Sound Echo/Distortion Poor Quality Problem Fix Solution
  • Ubuntu 10.04 Flash Videos have tearing/lines Solution
  • File /etc/vz/conf/ve-vps.basic.conf-sample not found: No such file or directory - Openvz Error solution