wget download all files on page/directory automatically recursively

Have you ever found a website/page that has several or perhaps dozens, hundreds or thousands of files that you need downloaded but don't have the time to manually do it?

wget's recursive function called with -r does that, but also with some quirks to be warned about.
If you're doing it from a standard web based directory structure, you will notice there is still a link to .. and wget will follow that.

Eg. let's say you have files in http://serverip/documents/ and you call wget like this:
wget -r http://serverip/documents, it will get everything inside documents but also browse up to .. and basically download every traversable file that can be followed (obviously this is usually not your intention).

Another thing to watch out for is trying to use multiple sessions to traverse the same directory.
By default wget will overwrite all files in place that it finds are duplicates.  The -nc option stops it from doing it, but I prefer the -N option which compares the time and size of the local and remote files and resumes if necessary and ignores them if they are the same (it doesn't compare by checksum though).  I think -N is what most will find makes sense for them.

Avoid traversing outside of the intended path, by using -L for relative only.


Best Way To Use wget recursively

wget -nH -N -L -r http://serverip/path

-nH means no host directory, otherwise you'll get a structure downloaded that mirrors the remove path which can be annoying.

Eg. it would create serverip/path/file

-N tells us to resume files if they are incomplete but if the remote file is newer or bigger, then resume/overwrite.  Otherwise nothing is done, the file is skipped since there's no sense in downloading the same thing again and overwriting.

-L says stay in the relative path and is the behavior that you probably wanted and expected without using -L

-r is obvious, it means recursive and to download from all links in the specified path

But even the above still does some annoying things, it will traverse as many levels as it can find and see.

Latest Articles

  • Xen how to mount disk images off-line and access data
  • Xen non-HVM container won't work/boot anymore
  • how to exit xen console session from xm
  • Skype Linux/Ubuntu Sound Echo/Distortion Poor Quality Problem Fix Solution
  • Ubuntu 10.04 Flash Videos have tearing/lines Solution
  • File /etc/vz/conf/ve-vps.basic.conf-sample not found: No such file or directory - Openvz Error solution
  • Ubuntu 10 Nvidia Drivers Not Updated After Kernel Update Solution/How-To Manually Rebuild nvidia kernel modules for Ubuntu
  • What's Needed To Fix Linux
  • mdadm/Debian problem
  • iptables block torrents/torrenting
  • vmware Failed to initialize monitor device 95% power on
  • OpenVPN don't use bridgestart.sh or bridge at all use iptables
  • postfix lopback error solution
  • Drupal/MySQL database error: PDOException: SQLSTATE[42000] [1044] Access denied for user 'db_user'@'localhost' to database 'dbname' in lock_may_be_available() (line 167 of /home/user/public_html/includes/lock.inc).
  • How To Burn ISO Image using cdrtools/dvdrtools for Linux/Unix Ubuntu/Debian/RHEL/Centos etc..
  • Ubuntu Suspend Solution Fix blank screen laptop/computer won't come back
  • iPhone Restore/Backup Location of Notes and Contact/Address Book
  • Install Android SDK and start testing on Debian/Ubuntu: How To Guide/Tutorial
  • gocr - free Linux OCR (Optical Character Recognition) conversion tool
  • Linux Kernel Panic Messages - Symptoms of bad RAM module/stick